CPU Cuda slows down rendering

Good morning (o;

I always thought that when I enable CUDA in preferences for CPU and GPU the rendering would be much faster…but to my surprise a sample scene from materialiq showed something else:

CPU: 1950X
GPU: RTX2080

CPU only: 09:06.08

GPU only: 04:22.13

CPU/GPU: 04:28.17

Is there an option to set the maximum number of cores blender will use for rendering?
Or is it just better to always turn off CPU CUDA support?

Performance->threads, after doing that I also set affinity in windows task manager to one core less.
Also, 6 seconds is not a lot. If the last rendered tile was a hyperthreaded core and it stared right before one-to-last ended then it took some time to render it. It will be fixed when we finally get the feature of one tile for all threads.

Played around with thread settings…only gets worse (o;

But…originally this sample scene uses a tile size of 128x128…so I lowered it and voilà (o;

32T GPU 128x128: 04:28.17
32T GPU 96x96; 04:08.20
32T GPU 64x64: 03:23.40
32T GPU 48x48: 03:06.26
32T GPU 32x32: 02:58.39
32T GPU 24x24: 02:58.26
32T GPU 16x16: 03:02.43

Seems 32x32/24x24 is the best tile size…maybe just for this scene…but a huge time savement :wink:

1 Like

32x32 is suggested for all scenes since cpu+gpu introduction.

I ought to test this again, but I believe that:

  1. CPU is faster with small tile size (eg 32x32 )
  2. GPU is faster with large tile size (eg 256x256)

So, for CPU + GPU, you cannot have both so have to pick one.

The last tests I did I concluded that working GPU only with large tile size was best for me.

(V2.79 nightly)

BR
JN

I think Cycles is faster for small tile sizes now even without CPU+GPU in many cases, but it can vary both because of your scene and your hardware.

Small tiles with the CPU added mainly helps to avoid the faster GPU being idle at the end of the render.

Think it is always advisable to do a test render with lower samples and test some tile sizes…

In my example scene to rendering went faster by 50% :slight_smile:

Can’t wait until my second GPU is back from repair (o;

Hi,


I think this subject often come back on this forum (for example here )
Please don’t forget to make a search before creating new topic :slight_smile:


To come back to your problem : when you check CPU+GPU , computation power will be shared per tile on the render. For example :

  • Let’s say you have 2 x RTX 2080 Ti , and a 8 cores CPU . If you render on 8 tiles. 2 will be given to GPUs, and 6 will be given to CPU. The GPU finish very quickly, and then your’re waiting for CPU to finish there tiles, during like… an eternity. Whereas if you uncheck CPU, the GPUs are doing both 8 tiles and it’s MUCH MUCH faster.
  • BUT If you have 200 tiles to render, and a very fast CPU, using the CPU in addition to the GPU can be better, because when GPU is doing some tiles, CPU will “save” time by computing some others

To sum it up, in fact it’s not a matter of tile size (32x32, 64x64, or wathever) but only a matter of optimisation considering number of tiles to render. If both GPU and CPU will work during ALL the render, it’s the best configuration. As soon as a tile configuration leads to a GPU stopping it’s work, and letting CPU’s do all the remaining tiles, it’s bad.

See you :slight_smile: ++
Tricotou

Hi.
Can you share an screenshot of Blender Preferences where you have selected your CPU and GPU?

Also, an screenshot of a partial render while image is rendering for CPU+GPU, where you can still see all the tiles working, to try to count the amount of tiles (Do not use progressive refine). You use 64x64 sizes of tiles.

Sorry…was busy with other blender things (o;

The problem is solved for me…no need I do screenshots (o;

Originally I thought it could be easily solved with limiting the number of threads on 1950X running…but I was totally wrong (o;

With dual GPU like RTX2080 or higher it makes sense though to completely turn off CPU rendering…as in my case it showed it then really slows down compared with Dual-GPU only…guess all CPU threads running were limiting PCIe throughput…

Interesting would be to know how tile size affects rendering with different GPUs, like RTX0280 and RTX2080Ti, besides the memory limit though…