AMD OpenCL Experimental Builds from latest Master & Un published New code updates


(3DLuver) #1

Hey, This is a new thread dedicated to OpenCL versions of latest master and un published Blender code updates that allow AMD card users to test and experience new features related to Cycles and wider Blender 2.8 developments.

This build is fresh from master and updated to include Lukas Denoise Branch and some other goodoes:

1:OpenCL GPU Volumes
2:OpenCL GPU SSS
3:Compositor SMAA Anti aliasing Node
4:Transparent Shadow controls
5:Fast Clay renders patch
6:Fast Adaptive subdivision patch
7:Over all Faster Shader node rendering patch
8:Rendering Cycles Scramble setting for Sobol etc
9:Cycles AO approximation for GI bounces beyond a user set level for faster renders
10:Lukas Denoise for CPU (wouldnt work with V4 branch and Opencl Rendering even when deactivated) DO NOT USE WITH GPU RENDERING OR BLUE SCREEN DEATH
11:OpenCL random math replacement for sampling experiment

Few other little code fixes for speed up.

This Branch was all about supporting Lukas great Denoise experiment which only works at this point for CPU bwith this build but also not killing opencl gpu rendering (as Lukas experimental wouldnt even compile opencl Kernels with that branch). So you can use cpu denoise for testing but also use Opencl GPU rendering in one branch with Volumes and SSS support.

Ive also been working on Shadow catcher to work with Opencl (almost there, Closed branch can do it but crash’s randomly at this point i have no idea why. THATS CODE when your new to GPGPU and Blender code base like me.)

When shadow catcher works ill add to this for testing. Fingers crossed Lukas puts some time into making the denoise system work with openCL and more importantly OpenCL on AMD cards.

Please Test and Post results, Render times, Compile times, Errors, Crash’s. Best way to find what’s what on other machines.

3DLuver Test Branch:

https://mega.nz/#!Z4wQ1JjS!EZCeRzlulZNvAyyRPniARem6xdyTxYBzeu4BgoQxHoM

Pointers to new settings:

AO Approx:


Denoise Panel:


Sobol (works for all filters) Scramble:



(Misfit410) #2

I use Nvidia at home, but at work I got myself an RX 470 (because 8GB at $200? heck yes!)
I’ve noticed it’s fast as heck on the things it works with, really need to check this out.


(Grzesiek) #3

Another build to test hehe. But now I’m really curious to see what I can squeeze out of my R9 290x…


(thomascheng) #4

have a rx 390 here. So far, cpu and denoise can produce faster results than pure gpu. Can’t wait for gpu to work with denoise.


(poromaa) #5

I guess mac people do not bother.


(3DLuver) #6

SorryDouble posted for some reson


(3DLuver) #7

NEW BUILD: 3DLuverExperimentalBranchV2 : https://mega.nz/#!IwIwEDaT!JyX2Dx3jw1L63ROvxFWnOujRdmyRyvufpT-8L7zvd0Y

This build is fresh from master and updated to include Lukas Denoise Branch and some other goodoes:

1:Blender 2.8 Master Merged into this Experimental Branch

2:Blender2.8 Patch Main workspace Intergration:

Main Changes/Features
Introduces the new Workspaces (general description).
Store screen-layouts (bScreen) per workspace.
Store an active screen-layout per workspace. Changing the workspace will enable this layout.
Store active mode in workspace. Changing the workspace will also enter the mode of the new workspace. (Note that we still store the active mode in the object, moving this completely to workspaces is a separate project.)
Moved mode switch from 3D View header to Info Editor header.
Store active scene in window (not directly workspace related, but overlaps quite a bit).
Removed ‘Use Global Scene’ User Preference option.
Compatibility with old files - a new workspace is created for every screen-layout of old files.
Default .blend only contains one workspace though (‘General’). Idea is that users can add pre-configured workspaces from a menu, rather than having a bunch of default ones of which most will probably never be used.
Support appending workspaces.
Ctrl+Left and Ctrl+Right now cycle through workspaces instead of screens - not sure if that’s what users want.
Also made sure opening files without UI and command-line rendering works fine.
Temporary UI for until new top-bar implementation is done.

Goto here for more instructions and info: https://developer.blender.org/D2451

3:OpenCL GPU Volumes
4:OpenCL GPU SSS
5:Compositor SMAA Anti aliasing Node
6:Transparent Shadow controls
7:Fast Clay renders patch
8:Fast Adaptive subdivision patch
9:Over all Faster Shader node rendering patch
10:Rendering Cycles Scramble setting for Sobol etc
11:Cycles AO approximation for GI bounces beyond a user set level for faster renders
12:Lukas Denoise for CPU (wouldnt work with V4 branch and Opencl Rendering even when deactivated) DO NOT USE WITH GPU RENDERING OR BLUE SCREEN DEATH
13:OpenCL random math replacement for sampling experiment

Pointers to new settings:

AO Approx:

[ATTACH=CONFIG]467140[/ATTACH]

Denoise Panel:

[ATTACH=CONFIG]467141[/ATTACH]

Sobol (works for all filters) Scramble:

[ATTACH=CONFIG]467142[/ATTACH]


(Kljeroio234) #8
Device init success
Compiling OpenCL program base
Kernel compilation of base finished in 2.94s.

Compiling OpenCL program split_data_init
OpenCL build failed with error CL_BUILD_PROGRAM_FAILURE, errors in console.
OpenCL program split_data_init build output: "C:\3DLuverExperimentalBranchV2\2.7
8\scripts\addons\cycles\kernel\kernel_path_surface.h", line 222: error:
          expression must have pointer-to-struct-or-union type
                                shadow_info->x = average(light_unoccluded);
                                ^

"C:\3DLuverExperimentalBranchV2\2.78\scripts\addons\cycles\kernel\kernel_path_su
rface.h", line 228: error:
          expression must have pointer-to-struct-or-union type
                                if(shadow_info) shadow_info->y = average(light_u
noccluded*shadow);
                                                ^

"C:\3DLuverExperimentalBranchV2\2.78\scripts\addons\cycles\kernel\kernel_path_su
rface.h", line 231: error:
          expression must have pointer-to-struct-or-union type
                else if(shadow_info) shadow_info->y = 0.0f;
                                     ^

3 errors detected in the compilation of "C:\Users\836D~1\AppData\Local\Temp\OCL3
576T2.cl".
Frontend phase failed compilation.

HD7950
Crimson 16.12.2
Default cube


(matali) #9

For me, the build simply won’t start sometime. It seems some patches breaks addons, triggering a crash. Starting after deleting userpref is ok. Probably it’s this workspace patch?
However, the AO patch only works on CPU, on GPU it does nothing, maybe report it on the tracker?


(3DLuver) #10

Yep if you could do a bug report on https://developer.blender.org/ it will help the devs out, That’s the whole point of these experimental builds to get immediate feedback even patchs not in master yet. Good job mate


(drgci) #11

3dluver thank you for the experimental builds ,i have noticed some things, volumetric rendering like fire and smoke its very slow on gpu, also overall rendering times its slower than nvidia cards for examble on bmw benchmark gtx770 need about 1min and 42sec when my rx460 unlocked to 1024 cores need 3min and 12 sec all most double the time


(BeerBaron) #12

You’re comparing apples and oranges. Even though the GTX 770 is older, it still has twice the memory bandwidth and more than 50% higher theoretical FLOPS(SP), compared to the RX460 (even unlocked). Also, make sure to test all the tile sizes.


(AlphaShow) #13

For one that and for the other CUDA is better optimized right now ^^


(Grzesiek) #14

Fully agree with BeerBaron on the tile aspect. Make sure you test 128x128, 256x256, 512x512 and full image resolution (one tile 1980x1080 - or what ever the scene is)

And as AlphaShow mentioned, CUDA is extremely optimized. A GTX680 isn’t far form R9 290x on my own scenes.


(.Pixel) #15

For one gpu I´m getting the best results with 1 tile at the output size and unchecking progressive (on any build).

Also i´m getting worst times with the ExprimentalV2 than with 2.78a using Radeon GPUs.

It would be nice to have CPU+GPU support.


(Lane) #16

I aggree that OpenCL need a good work on optimization with Cycles… specially when compared to openCL AMD on other render . (Luxrender is damn fast on OpenCL ).

But it is somewhat normal, Cycles OpenCL version are still new when Cycles CUDA is running from a good time now.

Anyway, back on topic, i will be really happy to test this build when i have a bit more time in front of me with 2x HD7970. ( Old GPU’s, but rock solid for compute and OpenCL performance ( a bit more of 9Tflops )


(BigBlend) #17

Bug report: Fire doesn’t work. It defaults to smoke, also the shadows doesn’t work with smoke domain. I’ll show a screen soon.



I had to make fire in the official blender and then open it on this blender.