Investigation - GPU Tile Performance drop - Test and post your BMW 2 results

Thanks to another thread for asking about GPU+CPU combo render, it forced me to do some testing if I can repeat the issue where our memeber didn’t see the performance improvements when mixing CPU+GPU.

Seems there is a GPU Tile Performance drop.

Expected that GPU tile settings no longer make a difference in performance, but it seems there is a regression at least for OpenCL in 2.8?

Can anyone else test and post their results of just GPU tests for various Tile settings?

So if you can BMW 2.0 (two car version), post version of OS\Blender\Render path (openCL\CUDA)

UPDATE Definitely an Issue with OpenCL.

Either Blender 2.8 or AMD drivers … still investigating

Screenshot below shows Single then Dual GPU usage. High usage is with Single GPU, lower usage with two GPU’s. Device usage drops by nearly 1/2 for each device… Also confirmed via GPU-Z… :frowning:

Still current testing:
Windows 10 \ Blender 2.8 \ OpenCL

Device\Tile 256 128 64 32 16
Vega 56 x 1.38.6 1.49.2 2.19.6 4.11.7
2x Vega 56 x 1.17.0 1.19.6 1.39.3 2.19.7
CPU - TR 1950x x x 2.15.9 2.04.4 2.00.5
CPU + 1 GPUs x x 1.27.7 1.15.6 1.29.3
CPU + 2 GPUs x x 1.19.5 1.07.6 1.10.6

3 Likes

You must use small tile values like 32x32. If you use only GPU render then set “256x256”, if you use “GPU + CPU” then set “32x32”.

Please check above, it “SHOULD” work but it doesn’t, indicating a potential bug in Blender. hence ask for others to test and report if it is a Blender 2.8 or OpenCL issue only.

As you can see in my table i tried on various tile settings, and 32x32 is worse then just my 2 Vega’s at 128x128… and that should NOT be the case.

Vega’s at 32x32 are over 1 minute vs, at 128x128 they are just 26 seconds. adding CPU to the mix is worse on all settings.

CPU + GPU x 1:06.6 32.5 29

29 sec. is a good result. Performance can not measure with lite scenes. If you want measure performance, then use heavy scenes and take render for long times, like 5 minutes, 10 minutes, or 30 minutes…

Because, Cycles first calculate nodes, vertices, etc. with CPU and after that begin render.

I can get approximate 2x fast render with Blender 2.80. But I can not measure this values with lite scenes.

What GPU and CPU do you have?

I just gave the bmw benchmark some quick runs.
5960X at 4.2GHz cpu benchmark ~3 1/2 minutes (I didn’t keep notes on that)
gtx 1070 256x256: 1’ 48"
128x128: 1’ 33"
64x64: 1’ 29"
32x32: 1’ 28"
16x16: 1’ 29"
cpu+gpu 32x32: 1’ 10"

1 Like

Thanks. So you have expected results of actual render improvements on the GPU wiht smaller and smaller tiles… And better when CPU is added.

Ok, i’ll try to downgrade my drivers for AMD and see if that is the case. I expect that is as the new RX 5700xt had OpenCL issues with last drivers and now they “patched it”.

1 Like

After downgrading from 19.8 to 19.5 still same results… anyone else with AMD GPUs has seen this issue?

I have tested couple of demo scenes with 3700x and Radeon VII and compared them with the results mentioned below:



They were a little faster in my tests, very close though.
But, regarding to hybrid rendering, it doesn’t work in every scene I suppose, especially for OpenCL.

My results:

BMW
GPU: 74s
GPU+CPU: 72s
CPU: 173s

Classroom
GPU: 80s
GPU+CPU: 122s
CPU: 255s

Pavillon
GPU: 194s
GPU+CPU: 127s
CPU:270s

Maybe we should report this issue. I am not totally sure.

You cannot measure render performance with small render times like these. Because, Cycles need preperation time before start rendering. This values always mislead you. Example, if you get 72 sec, then maybe 30 sec used for preperation, for another render maybe used 40 sec, who know? This preperation times are variable and mostly use CPU. You must use heavy scenes and take long renders for true results (minimum 5, 10, 20 minute etc)…

Can’t you see estimated preparation time? Certainly not more than 4-5 seconds for the demo scenes. I understand your concern however I got similar results even if in some of my projects.

Post the actual file (with textures if it has) you use and the blender folder so everyone can test the same thing.

First, small render tiems, that still gives way more information and permitts various retests. So the BMW 2car scene is sufficient, as I have already confirmed that dual GPU is broken.

Now main thing is to test at various tile settings.

for me I tested as following

Started first run, but ignored the inital result as the kernel required compilation time. All following renders (while Blender is not restarted) will not recomplie the kernel.

Single and Mulit GPU at 256x256, 128x128, 64x64, 32x32, 16x16 tile settings.

  • Expectations are that render times should be similar.
  • Reality, noticable slowdowns

This has direct impact on CPU+GPU render mix.

So, you can FULLY get sufficient measures of Reder Performance with these scenes.

If I get more confirmation I’ll be raising a ticket this week.

So, this is a good example of correct scaling as Birdnamnam had (using CUDA) vs what I’m seeing (using OpenCL)

@egementncr, can you test just BMW at the 5 settings below on your Radone VII. Just GPU testing. I just need a confirmation if you get worse and worse scaling like me or if you have the correct scaling.

Source Device\Tile 256 128 64 32 16 Conclusion
Birdnamnam GTX1070 1.48.0 1.33.0 1.29.0 1.28.0 1.29.0 –> Correct Scaling <–
Grzesiek Vega 56 x 1.38.6 1.49.2 2.19.6 4.11.7 Wrong Scaling
Grzesiek 2x Vega 56 x 1.17.0 1.19.6 1.39.3 2.19.7 Bad bad…
Grzesiek CPU - TR 1950x x x 2.15.9 2.04.4 2.00.5
Grzesiek CPU + 1 GPUs x x 1.27.7 1.15.6 1.29.3
Grzesiek CPU + 2 GPUs x x 1.19.5 1.07.6 1.10.6
1 Like

No, if your scene heavy, take 30 second, sometimes 50 second. Its based your scene and node setup.

I tested some scenes and always take invalid results with small render times. I only take true result with long render times.

This not show bad scaling. These are two different graphics cards, different architecture, different capabilities, different chipsets… What is Vega’s 16x16 texture processing capability? Or what is CPU and GPU data bus capability? Vega use HBM2, Geforce use GDDR5 graphics memory. These are different architectures, not same.

Ok, sorry but you seem to not have enough facts.

  1. few months back, my scaling on my VEGA was nearly identical to the GTX 1070 listed above.
  2. Blender’s internal testing showed Vega on par with the GTX 1080
  3. You are not providing anything of value to the post, and stating incorrect facts, potentially indicating a troll. If you are unsure about what you ware writing, I recommend you do not, else I’ll flag you to the admin.

I’m graphic programmer man, you can listen your words before say anything… Did you wrote any graphic code in your all life?

My apologies, but seems you do not.

2.79 builds after “C” release had the CPU+GPU available since late 2017, and it worked perfectly, way before 2.8 was released.

And Alpha/Beta 2.8 when I tested also worked perfectly on scaling

This indicates that “recently” some optimizations were done in blender 2.8, were a bug was introduced that is causing significant performance REGRESSIONS for OPENCL compute rendering.

And this is the only Point of this post. To identify where the BUG is. Is it a AMD driver issue, or 2.8 Bug issue.

If you do not have any valuable information to post related to this situation, please do not post, or I will report you to the admin