OpenCL denoising and speedup thread

What are you sing for static analyzer, debugging etc ? CodeXL ? http://gpuopen.com/compute-product/codexl/

2.2 have been released

sorry double post. ( dont know why when i edit a post it post it a second time )

None of them, the parts of the code I modify are not really error prone. So it just works most of the time and when not, I know where it comes from. For performance, it’s just lot of real case benchmark.

For people who asked, tiles are now updating at each sample in OpenCL (in the split kernel branch).

Updated version in first post. http://www.filedropper.com/openclspeedupbuild

Thanks. Will try it out. :eyebrowlift:

Edit: It’s exactly the same as the daily build. It’s nice that the speed ups are in the master now. Great stuff.

Ah, forgot to mention, tick the option selective node compilation in the performance tab. and remove the kernel compilation time from the total time. Somehow, on windows, it keeps compiling the kernel over and over again. should be much faster than master :slight_smile:

Nice. 01:06.50. That’s 4 seconds faster than the newest build. 03:19.53 with 35 sq sampled.
So cool.

Ryzen benchmark file is done on 19.8s. On 2.78c it’s done 26.55s

Crashes on fishy cat benchmark. Classroom crashes too.

Classroom works for me, fishy cat indeed crash, will look into that. The best speedup is on Barcelona. Please render the 1000spp version from the official pack with one tile (or 2 if you have 2 cards of course)

I get that bug that it compiles kernel every time I render. It takes about 2 minutes every time ;/

Edit: I get about 11 minutes on barcelona. 13 min with compile time.

Allright. I got lucky and didn’t get the compile time for barcelona. 11:35.79.

That’s like 4 minutes faster than a couple of months ago.

On fishy cat I get this error message:

OpenCL error: CL_INVALID_KERNEL in clSetKernelArg( kernel, start_argument_index + current_arg_index, arg1.size, arg1.pointer)

Yep, koro crashes too. The OpenCL code has changed a lot in the last days. It will take a little bit of time to adapt.
Gains offered by master now are also pretty huge, so you can already enjoy great perf without the patch. In fact a user pointed out that latest master gives better result with 64x64 tiles and I can confirm. With this setting and thanks to Mai, Sergey and Hristo for their very good work, you get about the same times as with my patch. Let’s see if gains can be stacked and lead to an even faster rendering for 2.79 :slight_smile:

Alright. I have to test it. Yeah, kudos to the devs. First time I don’t regret that I have bought an amd GPu :smiley:


A build with OpenCL Denoising working and some speedup patches from Nirved is available. With it, the classroom scene renders in 7min30 on a RX480 compared to 6min30 with a 1080. So a 200€ card is nearing the perf of a high end card.
With Denoising on at default, it takes 8min15 and is noise free.
The build is available here: https://gum.co/CMIu .
You can donate what you want. It will help me to continue working on the OpenCL performance.

Wow thanks. I will donate. I just have to do that from my laptop. I love that Blender will be competitive and noise free renderer :smiley:

Wohoho, congrats on your achievements :slight_smile:

Q: Does oCL rendering works with nvidia GPUs, CPUs, hybrid/mixed? If not, will it in the future?

It works with CPU, but is yet slower (because of optimal tile size, global size, etc…). It can be made faster but is not the priority yet. To test with CPU, you have to set CYCLES_OPENCL_SPLIT_KERNEL_TEST to 1 in your environment variable and then activate it in the system tab of the user pref.
For NVidia, it should work. Just test :slight_smile:
Render on command line to get optimal speed.

I gave $2. I only had that much in my paypal :confused:

Would have given more because my life seems more joyful with denoiser working on OpenCL. Was tired of being outside the fun CUDA club.

OK, done few simple tests.

CPU (intel X 5650):
Both modes (PT & BPT) work fine (& a bit faster than using Experimental build (82bcfb9-win64-vc14).

GPU (GTX 1060 + Q M5000):
At the moment, only single card can be used for denoising (if two are used, only one set (GPU) of tiles gets denoised). Speeds are similar to that of Experimental. BPT settings are excluded - doesn’t work?.

PS
On occasions, when switching computing units, especially when using more than one, compiling times get linearly(?) longer. Use with one at a time (/blender iteration) and you’ll be fine.

First impression - Good work!
Although haven’t tested on a complex scene & no AMD unit at hand to give a proper spin, if it behaves similar as on nvidia it feels well done.

Keep it up
AMD users, taste & enjoy!
:slight_smile:

I"ve got a memory leak when rendering smoke animation,this only happen when using GPU mode.It render the animation until memory get full then crash.

Do you use latest driver? Does it also happen on master?