Improved OpenCL build beta

EDIT: IMPORTANT: This thread is not for people to post wishlist for new features or design changes nor Brand discussions. Please post useful bug reports (with steps to reproduce, specs from your machine and files having the bug) and benchmark results (see further for details).

Hi all,

After seeing the RX 480 perfs I bought one and played a bit again with Cycles code to get some more performance.
The build is here https://ufile.io/90e0707. Depending on the scene, I got between 1.9x and 1.2x speedup on RX480.

It would be nice if you could report the rendering times of your card with those bench https://download.blender.org/demo/test/cycles_benchmark_20160228.zip with this build and with master to compare.
The BMW, Barcelona and fishy_cat are the most representative benchmark (product design, architecture and animation/cartoon).

With good datas from you, I may better improve the perfs, it’s a first draft.
For those with enough time, please post times with supported and experimental kernel.
Viewport render is also increased dramatically. Would be nice if you could post the times to render the BMW and Barcelona scene in the viewport. BMW has only one big viewport, render it until it has all samples. Same with Barcelona using the little 3D viewport in the bottom-right.

Have fun, it opens the door to really fast rendering to many artists :slight_smile:

Edit: Patch is available here: https://developer.blender.org/D2254

Do you have a patch / diff of your changes for the OpenCL Build?

Thanks, I will check it tomorrow!

Thank you, I will give it a go on my rx480

All renders are after compilation done (2nd render)

Win7, 64 bit

BMW scene 1920 x 1080 at 50%
Samples: 35X35

Your Blender version:

Tile size: 240 x 270 (instead of the default 256 x 256) - RX 480: 5 min 30 sec
Tile size: 960 x 540 - single tile - RX 480: 4 min 09 sec

2.78 RC II:

Tile size: 240 x 270 (instead of the default 256 x 256) - GTX 970 CUDA - 4 min 06 sec
Tile size: 240 x 270 (instead of the default 256 x 256) - RX 480: 7 min 17 sec
Tile size: 960 x 540 - single tile - RX 480: 5 min 45 sec

2.74 (I use it a lot):

More results tomorrow.
Tile size: 240 x 270 (instead of the default 256 x 256) - GTX 970 CUDA - 4 min 58 sec

@bliblubli,

All I can say is Holly ShitFuckAroooo.

AMD Fire Pro W9100 16 GB GDDR5:

Blender Official 2.78 RC2: 19 mins 07.55 seconds.
Micro displace enabled with experimental activated in render settings


Your New OpenCL Version 2.78
Same settings as stated above, same scene nothing changed: 7 Mins 59.20 seconds!!!


You realis were screaming out for Blender code devs right now, For the LOVE OF GOD, Contact Ton about working on blender even in spare time. We clearly need your skills.

Awesome work, Your more than doubled my render speed in one little update, love to see what you could do for us long term. :evilgrin:

And as i said im on Hawaii Fire Pro card, Not the new GCN so the changes your making im pretty sure will benefit all GCN based cards.

Happy to hear it helps you :slight_smile: Microdisplacement indeed is where the improvements are the most impressive. I’ll post a patch on the patch tracker as soon as I know it renders correctly most scene. I changed a lot in the selective node system and I have to ensure the needed nodes are always compiled.
I also played a bit with AOS and SOA, which can now be switched with the experimental/supported kernel, so please also post results with supported kernel as most benchmarks are set to experimental by default and it’s slower most of the time. You may have even better render times using supported (except of course for micro-displacement which needs experimental)

bliblubli could you describe what changes did you put into OpenCL code with your build?

For example with your build 90e0707 hair rendering it seams to be fine on my GPU


On todays build blender-2.78-c532695-win64 you can see weird shadows on the sphere


On both builds lighting does not work correctly on my GPU , but that is due to drivers.
Could you message developers to implement this OpenCL code in daily build?

Maybe stupid question, but why rendered image/tile cannot be seen during rendering in OpenCL?

Awesome :slight_smile: First results, will update as they come:
[TABLE=“class: grid, width: 500, align: center”]

Master
Optimised build supported
Optimised build experimental

BMW
2:30
2:09
2:14

Barcelona
9:01

8:59

8:13

Fishy Cat

5:15

[/TABLE]

Is there any plans to make this compatable with blender renderer instead of cycles?

Hello Matali,

may I ask the OS, GPU, tile sizes?

These are awesome results.

I don’t really understand the question.

If Blender Render uses OpenCl, this would be great!

Win7, 2x RX480, each GPU renders a half of the image. I should write my spec in my signature to stop writing them in each posts :smiley:

Edit: I added a signature and activated the option to show signatures but no luck :smiley: How do you activate a signature here?

Barcelona renders, Win7, 64 bit

Your Blender version:

Tile size: 320 x 360 (instead of the default 256 x 256) - RX 480: 18 min 24 sec
Tile size: 1280 x 720 - single tile - RX 480: 16 min 01 sec

2.78 RC II:

RX 480:
Tile size: 320 x 360 (instead of the default 256 x 256) - RX 480: 20 min 10 sec
Tile size: 1280 x 720 - single tile - RX 480: 17 min 37 sec

CUDA:
Tile size: 320 x 360 (instead of the default 256 x 256) - GTX 970 CUDA - 15 min 32 sec

Thanks! I don’t know how to activate:(

Blender Render uses CPU as far as I know and will be not supported soon in Blender.

I’ll post the patch as soon as it’s stable. So it may be in daily build someday (but it has to be accepted) and you will then be able to read the code. Describing code is a bit hard. I changed the node group levels, made a finer grained selective node compilation and made some changes regarding SOA and AOS. All of this based on a load of benchmarking/profiling to know where to tune things. So pretty boring work in itself but the results are rewarding :slight_smile:

@Almatap I’m not a paid dev at the moment and for me the way tile rendering work is good enough. So if you want it to work another way, speak with the paid devs :slight_smile: Changing designs are things you can debate very long :smiley: I preffer to further improve performance.

Thanks again for making these speed improvements, you did wonderful job!

I didn’t think you are a paid dev, it was a generic question for anyone who could answer it; I simple do not understand why OpenCL rendering does not show the tile content during rendering.

Yesterday I made some tests with my actual very heavy scene, it resulted my first blue screen on this PC I work on (instead the regular opencl error messages). I will try to identify somehow what i behind it; I also noticed I get sometimes display timing error messages (like I had some with CUDA).