Multiple GPU and PCIe lanes

I am having a system built for use with cycles, and I see contradictory answers to the question of multiple GPU and PCIe lane usage.

The answer I see most often in this forum and others is that the lane usage for cycles is very low- so with 3 GPU you can expect to render almost 150% faster than 2 GPU, and almost 3 times faster than one. This claim is supported by the benchmark spread sheet here: https://docs.google.com/spreadsheet/ccc?key=0As2oZAgjSqDCdElkM3l6VTdRQjhTRWhpVS1hZmV3OGc#gid=0
Specifically rows 15, 16, and 17: 1 gtx 670 renders a scene in 77 seconds, 2 gtx 670 render the scene in 39 seconds, and 3 gtx 670 do it in 27 seconds… 77/27 =2.85 faster for 3 cards compared to 1 card…

The other answer I see is that by adding more cards you cripple the performance of your PCIe bandwidth and that rendering speed will increase far less than linearly- THIS argument is supprted by benchmarks like this: https://www.youtube.com/watch?v=j8wK6DOXzAA&feature=youtu.be
Where he goes up to 4 titans:

Time: 0 min 28 seconds (GPU - titan black) (256x256 tiles)

Time: 0 min 13 seconds (GPU - 4x titan black) (256x256 tiles)
4 cards BARELY twice as fast as 1! (but since its only 30 seconds it could be misleading since part of that may be build time?)

The fellow doing my computer build INSISTS that 3 cards will perform very poorly due to PCIe bandwidth and it would be a waste of money to go for 3 cards… Does anyone have a definitive answer with the technical explanation as to why this is or is not the case?

If you have a 3 or 4 gpu card system benchmarks for 1,2,3, and 4 cards would be the best data I could have!

In basic terms, when rendering there are three main processes:

  1. Building the scene (IIRC? CPU bound)
  2. Transferring the scene (PCIe bound)
  3. Rendering the scene (GPU bound)

Building the scene can take anywhere from 1-20 (or more) seconds depending on your scene.

Transferring a scene can take up to 12GB if your using the maximum size GPU currently available. A single PCIe 3 lane has approx. 1GB/s transfer rate. That means 3 cards running at PCIe 3 x4 will have 4GB/s transfer. Hence a 12GB scene will take 3GB to transfer to the GPU.


Rendering the scene will take however long based on your GPU’s and the quality desired. Each GPU renders at 100% percent of its capacity, which can only be lost if one slow tiles leaves a GPU rendering while 2 are idle.

Benchmarks and comparisons should be done on scenes you are likely to render, and ideally realistic scenes.
Testing on a scene that takes 30 sec to render on 1 GPU is pointless if the build time is 5 seconds and GPU time 25 seconds.

Switching to 4 GPU’s would make the GPU time 6.25s, and the build time still 5 seconds, so the results seem skewed. If you render a high quality scene with 20 seconds build time, 3 seconds transfer and 5 minutes render time, the advantage of multi GPU’s is properly shown.

Thank you!
Do you have any links to benchmarks (other than the one I linked) that demonstrate this clearly? I don’t doubt you as what you say agrees with my experience with 3 GTX 580 s in my system, but hard data on this would help me convince the builder that I am not insane.

Sadly I only have 2x GPU’s :confused:

I’ve been toying around with this, The huge pay off in using more gpu’s comes with scenes that require a higher and higher number of samples. If your doing high vert count, caustics and SSS pile on the gpu’s. Otherwise if what you are doing can be done with less then 500 samples then anything more then 4 would be a waste.

LOL “more than 4”, I WISH!!! No, the decision is 2 GPU or 3- and while I usually do ~500 samples on most of my scenes they may take ~10 minutes (with my old 3x580 system)- and usually they are frames of an animation so I am rendering 100’s of frames at a time.

For my desktop i’ll use 2 gtx 760’s for most of my rendering needs, On occasion i’ll put them in my workstation to give it 6 total gpu’s to use, And even with 4 of them connected via a 1x pcie adapter it is still a nice speed up on the over all render time, Even though it takes a few more seconds to load the scene into each vid card.

I won’t worry about a 5 or 30 second load time if its spending 2 or more minutes a frame to render.

One huge consideration for scaling like that is power supplies though. Be generous to your gpu’s. A 400 watt powersupply will push two cards, But just because there is 5 branch offs on that cable does not mean its a good idea to attach 5 things. Most computing devices anymore will throttle back if they over tax the power supply.

I was going to use a 1600 Watt supply… that should be well over what I need for 3 titan x…?

My current rig. 2 Titan Z (4 Gpu core, 6GB each), 11520 cuda core with 750w usage, very efficient and it is as fast as 12 cpu(i7 2.8ghz quad core).


Plenty. I’m currently running four of them with a 1500w psu with no problems.

That vid you linked is mine btw. I’ve redone that benchmark and the results are almost exactly the same, but the UI render speeds are vastly slower. Others have stated that it’s due to Cycles not yet being optimized for Maxwell GPUs…


My computer is a nightmare,
I have a 750 watt powersupply for the mobo and an extra 600 watt powersupply for the vidcards.

NICE system you have.
Thanks for the information, but I am not sure what you mean by: “but the UI render speeds are vastly slower.”
Would you say that the render speed for 3 Titans is near 3x the speed of one? Do you see any penalty due to PCIe bandwidth?

Render times for a scene that takes more than 5 minutes on a SINGLE titan for 1, 2, 3 and all 4 Titans would be valuable info to have when deciding if a 3rd card justifies the cost.

The times you gave in the video seem to indicate 4 titans are ~2x as fast as one, but with render times that short its impossible to draw any conclusions since we don’t know what the build time was.

Thanks for sharing whatever info you have!
If I get 3 cards I will make a benchmark video showing how many fps I get on Minesweeper, and solitaire… I have two Titan X Hybrid on order- and need to decide if I want a 3rd…

Basically I mean this: https://www.youtube.com/watch?v=3f64udu3QYM

Rendered view in the viewport. In that video you can see how quick it takes for the full 200 samples to be reached (25 seconds) but with four TX’s it was vastly slower (1 minute 8 seconds).

I don’t think it’s the case that having more GPUs will divide the render time equally i.e having four GPUs will divide the time of one GPU by a factor of four. I think Cycles caps out at a certain point that if you were to have an entire render farm of, say, 16 GPUs the same benchmark scene will render in an instant, even with a build time. Cycles is still heavily underdeveloped for that type of capability. Octane, on the other hand, does have better performance. I did an update with Otoy’s render benchmark file and I did notice a significant speed in render – barely a shimmer of noise, in fact, with 1000 samples – in comparison to the four Titan Blacks.

I’ve just finished an updated bench with my TX’s and is currently uploading on YT. I’ll post it here when it’s done, although for some reason I can only select each individual GPUs or all of them; I can’t select other combinations.

The bottom line is, it’s not actually the GPUs that are the problem here. It’s Cycles not fully implemented to handle Maxwell architecture. I think it’s a case of buy another GPU if you can afford it, but don’t just rely on Cycles but try other renderers if you can. Otherwise, if you’re going to stick to Cycles only, then it’s a case of waiting for Cycles to be optimised for Maxwell but if it doesn’t then I wouldn’t bother with another. I bought four because I don’t only use Cycles.

I’m not really sure about gaming. I’ve seen people on the Overclockers forum doing gaming benchmarks for certain games at really high definition with 60 fps and one Titan X was pretty much enough. I think if you buy more than at least two you’re probably wasting your time. But beyond that I don’t know much.

Sorry, my lame attempt at humor failed…
I was joking about gaming- I only have the games that come with windows on my computer- the computer will be for Cycles and other 3d rendering only.

And I see what you are saying about the viewport render in your video- and a fast viewport render is important to me, but not as important as how fast it will render frames of an animation. So the numbers that would mean the most to me are the time it takes 1, 2, and 3 Titans to render a scene that takes maybe 5 minutes to render on 1 titan…

So the times that it takes to render the Mike Pan scene to 2000 or more samples on 1 titan x, vs 2 titan x, vs 3 titan x… its hard to draw conclusions when the render times are so short…