OpenCL denoising and speedup thread

Lane · February 1, 2017, 2:39am

What are you sing for static analyzer, debugging etc ? CodeXL ? http://gpuopen.com/compute-product/codexl/

2.2 have been released

Lane · February 1, 2017, 2:40am

sorry double post. ( dont know why when i edit a post it post it a second time )

bliblubli · February 4, 2017, 10:43am

None of them, the parts of the code I modify are not really error prone. So it just works most of the time and when not, I know where it comes from. For performance, it’s just lot of real case benchmark.

For people who asked, tiles are now updating at each sample in OpenCL (in the split kernel branch).

bliblubli · March 12, 2017, 8:04am

Updated version in first post. http://www.filedropper.com/openclspeedupbuild

BigBlend · March 12, 2017, 8:33am

Thanks. Will try it out. :eyebrowlift:

Edit: It’s exactly the same as the daily build. It’s nice that the speed ups are in the master now. Great stuff.

bliblubli · March 12, 2017, 9:31am

Ah, forgot to mention, tick the option selective node compilation in the performance tab. and remove the kernel compilation time from the total time. Somehow, on windows, it keeps compiling the kernel over and over again. should be much faster than master

BigBlend · March 12, 2017, 11:26am

Nice. 01:06.50. That’s 4 seconds faster than the newest build. 03:19.53 with 35 sq sampled.
So cool.

Ryzen benchmark file is done on 19.8s. On 2.78c it’s done 26.55s

Crashes on fishy cat benchmark. Classroom crashes too.

bliblubli · March 12, 2017, 12:29pm

Classroom works for me, fishy cat indeed crash, will look into that. The best speedup is on Barcelona. Please render the 1000spp version from the official pack with one tile (or 2 if you have 2 cards of course)

BigBlend · March 12, 2017, 12:58pm

I get that bug that it compiles kernel every time I render. It takes about 2 minutes every time ;/

Edit: I get about 11 minutes on barcelona. 13 min with compile time.

BigBlend · March 13, 2017, 12:19pm

Allright. I got lucky and didn’t get the compile time for barcelona. 11:35.79.

That’s like 4 minutes faster than a couple of months ago.

On fishy cat I get this error message:

OpenCL error: CL_INVALID_KERNEL in clSetKernelArg( kernel, start_argument_index + current_arg_index, arg1.size, arg1.pointer)

bliblubli · March 13, 2017, 2:50pm

Yep, koro crashes too. The OpenCL code has changed a lot in the last days. It will take a little bit of time to adapt.
Gains offered by master now are also pretty huge, so you can already enjoy great perf without the patch. In fact a user pointed out that latest master gives better result with 64x64 tiles and I can confirm. With this setting and thanks to Mai, Sergey and Hristo for their very good work, you get about the same times as with my patch. Let’s see if gains can be stacked and lead to an even faster rendering for 2.79

BigBlend · March 13, 2017, 3:06pm

Alright. I have to test it. Yeah, kudos to the devs. First time I don’t regret that I have bought an amd GPu

bliblubli · April 8, 2017, 3:37am

A build with OpenCL Denoising working and some speedup patches from Nirved is available. With it, the classroom scene renders in 7min30 on a RX480 compared to 6min30 with a 1080. So a 200€ card is nearing the perf of a high end card.
With Denoising on at default, it takes 8min15 and is noise free.
The build is available here: https://gum.co/CMIu .
You can donate what you want. It will help me to continue working on the OpenCL performance.

BigBlend · April 8, 2017, 3:58am

Wow thanks. I will donate. I just have to do that from my laptop. I love that Blender will be competitive and noise free renderer

burnin · April 8, 2017, 4:02am

Wohoho, congrats on your achievements

Q: Does oCL rendering works with nvidia GPUs, CPUs, hybrid/mixed? If not, will it in the future?

bliblubli · April 8, 2017, 4:17am

It works with CPU, but is yet slower (because of optimal tile size, global size, etc…). It can be made faster but is not the priority yet. To test with CPU, you have to set CYCLES_OPENCL_SPLIT_KERNEL_TEST to 1 in your environment variable and then activate it in the system tab of the user pref.
For NVidia, it should work. Just test
Render on command line to get optimal speed.

BigBlend · April 8, 2017, 2:54pm

I gave $2. I only had that much in my paypal

Would have given more because my life seems more joyful with denoiser working on OpenCL. Was tired of being outside the fun CUDA club.

burnin · April 8, 2017, 4:11pm

OK, done few simple tests.

CPU (intel X 5650):
Both modes (PT & BPT) work fine (& a bit faster than using Experimental build (82bcfb9-win64-vc14).

GPU (GTX 1060 + Q M5000):
At the moment, only single card can be used for denoising (if two are used, only one set (GPU) of tiles gets denoised). Speeds are similar to that of Experimental. BPT settings are excluded - doesn’t work?.

PS
On occasions, when switching computing units, especially when using more than one, compiling times get linearly(?) longer. Use with one at a time (/blender iteration) and you’ll be fine.

First impression - Good work!
Although haven’t tested on a complex scene & no AMD unit at hand to give a proper spin, if it behaves similar as on nvidia it feels well done.

Keep it up
AMD users, taste & enjoy!

Doisy_Didier · April 9, 2017, 3:31pm

I"ve got a memory leak when rendering smoke animation,this only happen when using GPU mode.It render the animation until memory get full then crash.

bliblubli · April 10, 2017, 11:23am

Do you use latest driver? Does it also happen on master?