Why does GPU + CPU take longer than both GPU and CPU?

I had tested it before with a Zen2 CPU and a GCN4 Radeon GPU. I rendered a simple scene in Cycles. The CPU’s rendering time was about twice as faster than the GPU. When I used CPU+GPU, the rendering time was a lot slower than when I used CPU alone. (I made it sure that “kernel rendering time” would not be the problem by rendering it multiple times.) I thought this was a particular problem of AMD’s GPU or because the GPU was too slow.

Now, I bought a new Turing Nvidia GPU (about twice as fast as the previous Radeon GPU), and I tested rendering the same scene. Here is the result:

  • GPU 11.3
  • CPU 21.3
  • GPU+CPU 30

As you see, using both GPU and CPU actually took longer time than using CPU or GPU alone. When I used GPU+CPU, at about 13 seconds, the scene looked all rendered, but the orange cross marks were there. I am not sure if Blender was refining the scene, but I could not see anything changing. The UI was still showing that it was rendering till about 30 seconds, and then said it was finished.

So, the same thing happens for both AMD and Nvidia, and even when the GPU is relatively powerful (for AMD, the CPU was twice as fast than the GPU, for Nvidia, the GPU is twice as fast than the CPU). Why does it take longer time to use both than using one?

1 Like

When you have a mis-balanced system, where one of the devices is much faster than the other, you can see this. The reason is that the faster device completes their tiles and then waits on the slower device to finish. You might try smaller tiles so that the faster device has more opportunity to keep rendering and doesn’t have to wait for the slow device as much.

Jason

I have been using the Auto Tile Size add-on, which seems now included by default. I think the general rule is use big tile for GPU and small time for CPU, because using small tile for GPU will reduce the GPU’s performance. So, if I follow your advice and manually set a small tile size, won’t that decrease the GPU performance?

This question was about rendering one frame, but I wonder if it is possible to use the CPU and GPU for separate frames when rendering an animation. I mean, each frame can be independently rendered, so, for example, if CPU is rendering frame 1 and GPU is rendering frame 2, they would not have to wait for tiles from the other, and maybe they can use different tile sizes?

In general, yes using big tiles for GPU and smaller tiles for CPU is recommended. I’m pretty sure that in the latest versions of Cycles the tile size doesn’t matter as much (GPU does better with small tiles now). But, your case is a bit different because you are combining the devices.

I haven’t done it, but some people here do run two copies of Blender at the same time and render on different devices with them. It would tie up your system while they were running but it would use your resources more efficiently. You could run one Blender on a range of frames and the other on the rest of the animation. I would give your GPU the most frames and the slower CPU a smaller bunch. To find a good balance between the devices on the number of frames to give each one will have to be tested. It’s going to be different for each animation you want to render. It would probably be easier to run both Blenders from the command line with a shell script.

Jason

So, there is no in-built feature for rendering different frames with different devices. I am lazy, so if it requires that much hassle, it probably is not worth for me. Anyway, I tested it with a manually-set smaller tile size (32x32 this time. Previously, it was using the automatic 256x240). This time, the results are:

  • GPU 11.8
  • CPU 21.3
  • GPU+CPU 8.4

I am using Blender 2.82, by the way.

Yeah, not that I know of anyway. :smiley: That is a much better time, you can play around with the tile size and find the optimum setting. You can go too small and then the tile setup time starts to take over.

Jason

64x64 renders fastest for me. Tested last weekend with the Blender Benchmarks in combination with an Ryzen 2700x and a 1080 GTX. 32x32 should be faster on slower CPU’s that also have less caches.

Edit: If you watch the taskmanger, set your GPU transfer statistic to CUDA to see the workload, as 3D may idle at 2%.