Blender Cycles GPU vs CPU+GPU test

I Don’t quite often render with CPU+GPU.
Most of the time I use GPU only.
But a whole ago I noticed that for some reason CPU+GPU wasn’t fast enough. Also when I use CPU+GPU for rendering, In the last tile it takes Forever to render when samples are 1000+
So I did a test on my old piano project
And here are the result

1080P with 100samples

Only GPU
16* 16=? (Didn’t bother rendering)
32* 32= 3min 07sec
128* 128 = 48sec
256* 256 = 42sec
512* 512 = 40sec

CPU + GPU
16* 16= 2min 3sec
32* 32= 1min 22sec
128* 128= 38sec
256* 256= 38sec (???)
512* 512= 1min 12sec (???)

I was trying to find a sweet spot for CPU+GPU rendering, 128* 128 and 256* 256 tiles gave me same render time, And only 2-8sec less only GPU rendering.
And yes 512* 512 tiles took more time because last tile took so much time, same as 256* 256 tiles. Even though with 256* 256 tiles it was a bit faster than 128128 tile, but because of that last tile we got the same result.
So for me 128
128/256* 256 tiles looks like a sweet spot, but not really worth it.
I also heard that for some people 1616/ 3232 is a sweet spot for CPU+GPU rendering,
So i’m guessing it depends on each hardware.

My CPU is not very powerful for rendering, But people who’re using Ryzen CPU may find more interesting rendering time with CPU+GPU.
My Specification
i5 6400 (Boost up to 3.3GHZ)
16GB DDr3 Ram (2Slot)
Rx 590 OC Edition 8GB Graphics card

Here is the .Blend file
https://mega.nz/#!F9RmmajI!xQV6dy6FH80IhnjQEySNdC8Q7bj_ofUg_x_VM-PI52o

I hope some people find this test useful :slight_smile:

Can you post the blend file so that I can try out the rendering speeds on my machine? If it is not a problem for you of course.

I have Ryzen CPU and some Nvidia cards in my system. I would like to see the results compared to your test and overall. THX

1 Like

Oh yeah Sure
I forgot to upload the .blend file

Blender just needs some more smarts when it comes to the end of a render with CPU+GPU. The issue with large tiles and CPU+GPU is that whoever is available when the last tile comes up gets the tile. Since the CPU typically has 4+ render units and the GPU only has 1, it’s almost always the CPU. The simplest fix would be to always leave the last tile for the GPU in a CPU+GPU render. Even better would be to keep track of the average render time for a CPU tile and a GPU tile and weight tile assignment towards whichever is faster. This may mean tiles would stop rendering immediately but it would be faster than letting a slower unit take them if a GPU is 3 or 4 times faster.

So how can we do that? I mean leave the last tile for the GPU?

Unfortunately we, as users, can’t do anything. Need the developers to change Blender. :frowning:

1 Like

You can check now,
Added the link.
Don’t forget to share your result :smiley:

Here are some of my results but I guess the scene is not very complex. The differences are only in seconds.

CUDA
GTX 1080 Ti

32* 32= 00:13:26
128* 128 = 00:10:99
256* 256 = 00:10:57
512* 512 = 00:11:27

CUDA
2x GTX 1080 Ti

32* 32= 00:07:35
128* 128 = 00:06:35
256* 256 = 00:06:31
512* 512 = 00:06:84

CPU+ 1 GPU CUDA
16* 16= 00:11:89
32* 32= 00:09:54
128* 128= 00:17:94
256* 256= 00:44:97
512* 512= 01:41:54

CPU+ 2x GPU CUDA
16* 16= 00:08:49
32* 32= 00:06:59
128* 128= 00:17:55
256* 256= not worth checking
512* 512= not worth checking

eCycles CUDA 1x GTX 1080 Ti
32* 32= 00:10:54
128* 128 = 00:10:42
256* 256 = 00:10:38
512* 512 = 00:10:47

eCycles CUDA 2x GTX 1080 Ti
32* 32= 00:05:77
128* 128 = 00:05:80
256* 256 = 00:05:86
512* 512 = 00:05:87

I use those two 1080Ti cards and Threadripper 2920x. The best render times I had with those two cards and eCycles at some default settings. I usually don’t render on CPU because I don’t have so complex scenes and the CPU is not so fast.

1 Like

Hmm looks like even with Threadripper CPU, Only GPU is better

1 Like

I’ve done quite extensive GPU/GPU+CPU comparison benchmarks recently. Check them here: https://devtalk.blender.org/t/somewhat-comprehensive-radeon-vii-cycles-benchmark-opencl-linux/11671

1 Like