Need Help Setting up Multi GPU Rig


I asked the core question a couple of days ago in another topic but I thought I should give it its own thread.

I need help from you computer savvy people please.

The rig I use has 5 RTX cards. One of which is on the primary x16 slot and the others connected via x1 risers. Running on Asus X370-F and Ryzen 1700x.

As I increase the number of cards cycles uses, scene build-up/load-to-device duration goes up linearly. This adds up to 35 seconds in a particular scene. The total render time is 75 seconds. Thus, the gpus sit idle half of the time.

This is obviously a huge loss when rendering animations with thousands of frames.

I am not a pc expert but would assume reading from RAM and loading to gpu via separate pcie interfaces shouldn’t take that much time.

I tried playing around with link speeds in bios with no significant benefit.

I was wondering if this is a limitation due to number of pcie lanes?
What would you suggest I upgrade?
Will using NVLink result in “combined” VRAMs (Load to single device instead of two)?
Or is it just the way cycles work?

As you can imagine, I am quite disappointed at the moment from not being able to benefit from additional gpus and appreciate any help.

Many thanks.

1 Like

What tiles are you using?

Outside what Tile Settings, and What resolution you are rendering.

YOu do have to take into account that on 1x PCIe lane. even with 3.0 specs, max speed is 980MB/s… so to fill in the memory buffer will take 8+ seconds for a 8GB buffer.

the SLI bridge does absolutely nothign. Each GPU’s memory is treated separately.

Best way to test is on a larger scene, like classroom scene, also best wya to monitor usage is GPU-Z.

  • 192x192 tiles @ 1080p
  • The scene takes 2GB. So, 2 seconds. Still, each pcie lane has its own resource, it should be able to work in paralel.
  • NVLink, Not SLI.

Thanks for the reply but I dont have a problem with the GPU usage. It works well once scene is loaded. If you mean bus usage in gpuz, it hardly hits 40% whilst the scene is being loaded. (Runs at gen2 x1)

My scene already is big enough I suppose as it’s giving me problems. I tried with all demo scenes though and results are consistently proportional to scene elements.

Ok I resolved the issue. (At least found a workaround).

I am confident it was a PCI resource problem. I probably wouldn’t be dealing with this on a sTRX4 system with plenty of lanes.

I dug into the motherboard manual and found this:
“PCIEX16_3 Socket shares bandwidth witch PCIEX1_2 and PCIEX1_3”
(Yeah it actually says witch) :rofl: :rofl: :rofl:

And guess what, I had risers on all 3.

So I removed one and connected it to a m.2 riser card. The other to the PCIEX1_1.

35 seconds came down to 22. Now I am going to play around with link speeds in BIOS and see if all slots can work at maximum supported bandwith.

1 Like

would recommend you lower the tile size to 32x32… Nvidia cards solved the low tile issue. would recommend to test it at 32x32 and 64x64 and 128x128.

i’m just very curious how your GPU’s scale (1 GPU, through 5 GPU) … I expect around a 4.8x speedup over single card…

It isn’t actually related to tile size. Tile size concerns the actual rendering process whereas my problem is before that while the scene is being built.

The GPUs scale almost 5x alright during actual rendering but I wouldn’t recommend more than 3 GPUs without a system that has enough lanes. Otherwise you will have the issue in this topic. I have 2080TI + 4 x 2070S though so it’s more like 5,4x.

1 Like

yep. Don’t know much about this but last week was looking into mining rigs for the possibility of creating a render node. From that I understood that the use of risers can limit your gpu by even 20%.

The GPU works fine but waiting for cycles to load data will set you back.

If you are rendering a single frame for an hour, 20 seconds lost in the beginning wont be an issue. But if it is an animation, that will add up to days.

In my case, I need to look into a x299 or x399 system soon. Four 2-slot cards can be installed on these but anything more will require at least x4 risers.

That is aligned with my expectation.

Memory transfer occurs at loading (textures/models/etc) and unloading (image). Beyond that everything else takes place within the GPU itself as such should not be drasticly affected by the 1x PCie connection…

Will need to setup and test with a spare card and see.