Blender 3.0 Cycles Multi Gpu Render Performance


i just started to make some Multi Gpu Rendering Tests with Cycles and i get some very strange behaviours with the Blender 3.0 release…
i have a 4x gtx1080ti system and with a very simple scene (box and sphere, default cycles rendersettings) i get very slow rendertimes (42 sec) with 4x GPUs enabled.
The strange thing is, when i only render with 2x GPUs enabled, i get 10 sec rendering time…
Does anybody also has this expiriance with more than two GPUs enabled?
Could it be that this 3.0 release is hardcoded to max two GPUs?

(the same scene rendered with 2.93.6 the rendertime is even faster as 3.0 with 4x GPUs, 6 sec. with same Cycles settings.)

(newest nvidia studio drivers, 472.47)

I remember reading some time ago about Cycles X not being optimized well for more than 2 GPUs. So it looks that your results confirms this. AFAIK the developers know about it, but they didn’t have enough time to optimize this part of Cycles before Blender 3.0 release.

very strange that 2.93.x is scaling nearly linear with multiple GPUs…
so the slogan “2-8 times faster than 2.93” on the new 3.0 realease is absolet?

did also a test with same scene on a 2x rtx2080ti and the 2x gtx1080ti system beats the rtx system…more strangeness…
(but maybe the scene is too simple for real comparing)

oh by the way, cycles x viewport rendering seems to take all 4 gpus into account. there its really fast, only render to disk seem to be very slow…

for now i only can say that when rendering performance counts, its better to stay in 2.93.x

In 3.0 there is no tiling by default which means the rendering the GPU does has to be kind of put together. If you change it to tiles you might get better times with multiple GPUs.

1 Like

using default cycles settings (4096 max samples). tiling is on by default with a size of 2048, tried also different tile sizes (128, 256, 512, 1024). 4x gtx1080ti gpus 42sec, 2x gtx1080ti gpu 10sec, 2x rtx2080ti 6sec (on my really simple scene, devices on cuda).

the rendertimes arent constant at all, i.e. rendering same frame multiple times results also in multiple rendertimes (some are faster, some are slower, few seconds in total, which is really big impact concerning such a simple scene)

also the gpu load is only between 16-26% in blender 3.0 per gpu in every system i tested (gtx and rtx)
in blender 2.93.x i see 100% gpu load per gpu

1 Like

In Blender 3.0 tiling is not used for setting performance. At least not to such extent as it was in 2.93 and earlier.

1 Like

thx for your feedbacks…
tiling is more for vram memory management (out of core).
would be nice if someone can test with more than two gpus, if this performance/render scaling issues are common in v3.0

at the moment i am a bit disapointed, since i expected that now with cycles x there would be a real competitor to all the other gpu render engines out there (concerning speed, see redshift, octane).

1 Like

Just out of curiosity does the GPU utilization go up when you render a higher res image. Try a 200% render and see what the GPU utilization is.

Also if you are rendering animations it is actually more efficient to start an instance of Blender per GPU or pair of 2 GPUs to render on, I have been doing it for years to get better CPU / Disk utilization otherwise they are sitting there idle for the most part. Ive been able to render 20+ FPS out of Cycles on a single machine using multiple instance of Blender each using different GPUs

This also works with eevee i’ve achieved 50+ FPS using this method on a single 3090, Eevee is quite bad at utilizing your GPU, depending on your scene you could start up to 10 instances of blender per GPU. in most scenarios you can start 2 instances of blender and get a pretty good speed up.

i did also cmd-line rendering/network rendering tests (animation). gpu loads stay the same on the nodes (16-26% only, very zick-zack over the frames).
wirh more than two gpus per node the rendertime increases drasticly…
in v2.93 i clearly see 100% gpu load on the nodes and the rendertimes are expected for the number of gpus per node…

why should the gpu load increases if i double the output resolution?

I’m using an Intel I7 (a 4700, it’s old). In 2.9 I would see 8 tiles in Cycles render, I assume 2 threads per core. In 3.0 I just downloaded, I only see 4 squares at a time, I assume not using 2 threads per core?

Also, I have the donut tutorial I made in 2.9 and Cycles renders it same in 3.0, but Eevee ffmpeg shows the transparent cup and saucer as black-ish with a few

white reflective highlights. Since I don’t know if I ever rendered it this way in 2.9 is this a feature of 3.0 or of Eevee generally? Or is a change necessary in the glass material for 3.0?