Render device cpu + gpu

Hi,
I recently built a new system with Ryzen 3700x and Radeon VII.
After some render tests, I realized that the gpu on its own is almost 2x faster than gpu+cpu. I used 512x512 tiles with the gpu. I suppose that Blender 2.8 does not need bigger tiles anymore in order to get fast gpu renders. Anyways, Is this generally how Blender behave? I am not complaining about my render times but I was surprised by the result. It’s really incredible in fact!
Cheers,

1 Like

Unfortunately yes. The thing is that Blender splits your render according to your tilesize like this:

Resolution x / tilesize y = amount of tiles z

Now every compute core begins rendering 1 tile.

Z tiles - 1 GPU - n CPU cores = remaining tiles u

When a compute unit finishes a tile it grabs one of the remaining tiles until u is 0. Now the thing is, since GPU are so fast, They might have rendered 3 to 4 tiles already while the CPU is still processing it’s first tiles. If the CPU grabs the last few tiles the GPU sits idle to wait for the next frame to get new tiles.
Since the CPU is so slow ( compared to a GPU) smaller tiles are recommended when using both together. You have to test, but don’t expect wonders.

Edit: In extensive testing around the community we generally settled at a tilesize of either 32x32 or 56x56, depending on the hardware, for hybrid rendering. Even then it’s heavy scene dependent to get speed ups.

4 Likes

It seems so.
Besides that, if also gpu is able to access system memory for heavy scenes, there is no need of hybrid rendering for most of the time I guess. It depends on system of course.

Yeah, there are experiments going around with denoiser and stuff. Brecht is working on a solution to share individual pixels of a tile between the CPU cores. That’s when it will get interesting. I think the hybrid rendering we have now is just a technical necessity to bring it to a really useful level.

Edit:

Haha yeah, I remember how cool of a feature it was in 2.79 nightly builds, when I only had a 1050ti and an old, old Athlon.
Now with two 2070s and a first gen 1800x it doesn’t thrill me that much.
But you are right, for a lot of people it certainly is a cool feature and we can only be thankful for that.

2 Likes

Drop the tile size to 32x32 cause currently what happens is that at 512x512 tiles:

1 tile assigned to Radeon VII
16 tiles assigned to Ryzen 3700x (one per thread)

so Radeon VII finishes in No time, and then you wait till all the remainign tiles are computed by the CPU…

By dumping it to super small, the split is still the same, but whiel the CPU processes the 16 tiles, yoru GPU will probalby process 30-40 tiles.

I had same inital problem for my THreadripper and 3 Vega setup. And wish there was more inteligence in how tiles are setup when it detects such setups.

So drop tiles to their knees and see render speedup 10 fold.

1 Like

An admirable system you have :slight_smile:
Thank you, I am going to try it. But, I remember that cpu+gpu 64x64 tiles was significantly slower than gpu.

Thanks. still not enough GPU’s … :slight_smile:

as for tiles, try smaller. 16x16 and 32x32 and report back. I’ll do some more testing on my side on various settings and report back.

1 Like

I have a dual GTX970 + i7 3930K and 32 x 32 is the best tile size I have found for GPU + CPU, in case this could be interesting for you.

1 Like

One thing I noticed:

If you set CPU + GPU render and set threads to ‘AUTO’, blender will recognize your CPU threads: lets say 6 cores x 2 threads and will set 12 threads.

And will use them: 10 threads for CPU and 2 threads for GPU in my case.

If you want use all the power you will have to set numer of threads to ‘Manual’ and set the number of threads to ‘number of threads of your CPU + number of threads of your GPU’

2 Likes

No luck. In my case, 64x64 tiles was the fastest for GPU+CPU. Kinda similar results though.
However, by far the winner is GPU only! It is almost twice faster.

Indeed there is something wrong… Just reran one of my latest projects… in blender 2.8 and 2.79, (results are similar to each other (2.8 is on average is quicker by a few percent over 2.79c)

Now I did remember when they “fixed” GPU rendering on small tiles, that seems to be either broken or hmm… no clue.

Either way, this confirms the issue you have. Both mixing CPU + GPU, but also seems that dual GPU is not giving me what I was expecting, especially on 256x256 tile setting. Will need to do more tesitng.

Device\Tile 256 128 64 32
Vega 56 37.9 41.3 58.3 2:06.6
2x Vega 56 33.3 26.3 35.1 1:07.9
CPU - TR 1950x x 1:03.4 47.5 42.9
CPU + GPU x 1:06.6 32.5 29
1 Like

Thanks for forcing me to investigate. as per my results I created separate post of just the GPU tile issue, as that is the primary cause of the CPU+GPU performance drop.

If you have a chance test blender 2.8 with BMW 2car version and post in that other thread for comparision. If I see others with similar results I’ll post a bug report to Blender

1 Like