Is 4 TFLOPS Allot? Video Card Experts Please Chime in

Hello all,
I have a budy who thinks he knows allot about computers. He recently informed me that if I have a motherboard that supports 4 video cards in SLI mode and CUDA technology that I can process my renders at 4 terraflops. And, make note that one of the cards needs to be a Quadro card.

Most interestingly he can not give any details though which puts his opinion in a gray area.

My questions are:

  1. How the hell BIG does the power supply have to be to run that many cards.
  2. Quadro? which one? will a cheap one work?
  3. What about HEAT? that many cards is going to cause some serious heat issues. Isn’t it?
  4. 4 Terraflops? by comparrison is that fast?

Anyone who can give me any information to confirm or deny this would be greatly appreciated.

Thanks for stopping by.

Quadro - i believe - don’t work with blender.

Also additional cards only increase by a percentage; so a second card is probably somewhere around 30-40% improvement on one card; no clue if adding a third and fourth gives the same percentage improvement or less still.

Well sure, 4 Terraflops would be a lot, the question is just if it would work with just any card also why would it have to be at least one Quadro card?

My knowledge is very limited when it comes to things like this but I believe that memory swapping between the cards bust be fast to improve render speed compared to just one card, as such I don’t think it will work on just any card. If you want to render with cycles I don’t think it supports more than 2 devices rendering at the same time (also 4 video cards where one of them would be a Quadro is going to cost a small FORTUNE)

How big the power supply has to be depends on the cards and how much wattage they consume, however for 4 cards to be running simultaniously you would need a heck of a power supply (if not two)

4 cards is going to produce a lot of heat for sure, for such a setup I believe it would be better to not have a computer case but let the motherboard sit on a special rail with some extra cooling devices.

All in all I would not advice you to buy something like that unless you are planning to set up a small GPU render-farm and know exactly what you are doing, not only the parts but the power consumption is going to cost you another fortune to keep it running, then comes the question what you are going to use it for.

I know that Octane and VrayRT for 3ds Max has full support for multiple video cards but for Blender and Cycles you are better off with two at most, I have heard that many of the members here are experiencing a slowdown using multiple devices instead of just one.

Hope it helps.

Quadro’s do work, but they do not have any specific driver optimization’s for Blender.

Secondly for four cards running, you’d probably need a 1250W PSU, which of course would use so much electricity that you’d probably be scared to leave it running for more than an hour or two, this is not something you’d want running 24/7.

Secondly, depending on the rendering engine you may be limited to the memory capacity of each single card, not the four of them as a whole, which would mean you’d be severely limited in scene complexity when rendering.

Lastly, if you want something to render your animations on, then just build a PC that has a small footprint (in terms of size, and power, but good ventilation!) and leave that running 24/7, you could even get away with buying server cases, and having a very small rack system.

They would use less electricity than the machine with four GPU’s running, they would produce far, far less heat, they would cost less (Only motherboard, RAM, CPU, a small HDD), and lastly they would be easier to manage.

GPU rendering is the latest ‘trend’, but it’s limited in a lot of ways that people tend to gloss over.

  1. Depends on the cards. 4 GTX580 (fastest NVIDIA single GPU) need a bigger power supply than 4 HD7970 (fastest singe GPU)
  2. I think you would’nt need a quadro. Every card is ready for Quad SLI/Crossfire
  3. Head would be the biggest problem. Each card would be next to the other without any space between them. The fans would go crazy and the cards would get really, really hot.
  4. 4 Tflops is a lot. But it is just the theoretical performance. A HD6870 has 2TFLOPS. A GTX 580 just 1.6 but the GTX is a lot faster. A HD7970 has 3,8 TFLOPS but it’s not twice as fast as a 580.

If you think about a multi GPU system you should be aware that 2 cards are not twice as fats as 1. SLI/Crossfire doesn’t scale 1:1

Is 4 TFLOPS Allot?

Well yes, to put things into perspective 4TFLOPS via 4 GPUs would put it between these two 10+ years old supercomputers:

1999: Intel ASCI Red (~2 TFLOPS, 9298 CPUs, 1212 GB RAM, using 850kW power and taking up 149 m² of space!)
2000: Intel ASCI White (~7 TFLOPS, 8192 CPUs, 6 TB RAM, using 3MW power + 3MW for cooling!)

Only thing going for those supers was the amount of RAM which still dwarfs modern PCs and GPUs, but yeah having 4 TFLOPS in a single tower using less than 2kW is pretty crazy. But I would not buy or use that myself just for Blender.

I guess in conclusion that “In Theory” my buddy is correct. ( Damn, was hoping to tell him he don’t know Jack ).
Though it might work, it sounds like the cons outweigh the pros in this case.
For instance the cost of the cards being around $2000.00 USD, the cost of electric, and other associated costs.

Everyones input has beed really informative thanks for the responses.

The costs for electricity would be high but rendering would take less time. Rendering 4h with 500W -->2kWh
1h with 2000W -->2kWh

But you can say to your buddy he’s wrong with the quadro point. You could do the same with “normal” GPUs

I just found this,852976/Antilles-Supercomputer-Elf-Grafikchips-in-einem-PC-geht-das-gut/Gehaeuse/News/

It’s German but you can look at the pictures :wink: He build a symtem with 5 HD6990 Dual-GPUs and one GTX 570
That means 11 GPU-chips. 3000W power supply ^^ The system has about 26 Tflops.

Nice pictures. I guess this is now on everyones wish list. lol
Also, I couldn’t help but to notice that he is using the same motherboard I have. The “SUPERCOMPUTER”.

  1. Assuming ordinary power draw, a standard sized (in terms of physical dimensions) ATX power supply will do fine.
  2. Did he say why? I can’t see any reason for a Quadro unless you’re doing CAD.
  3. Depends on the particular models of the cards chosen, the type of your cooling system and the airflow in your enclosure (assuming you’re using air). But no, it’s not that bad at all.
  4. You can make an old 486 render fast. But the pictures won’t be as nice. Rendering tasks tend to stretch out to use up all your available time and computing power no matter how much silicon you throw at them. So no, ordinarily it won’t make your render times faster because you won’t feel under any pressure to make them so. Your pictures might be a bit prettier though. Remember that on GPUs you’d basically be limited to pathtracers at this moment in time. Which may or may not be optimal for the job at hand. Also keep in mind not all GPU render engines are coded well enough to keep all your GPUs well fed with tasks all the time. For example Cycles doesn’t do a very good job of this at the moment, while Octane is much better.
  1. SLI != multiGPU. You don’t need to run SLI to use multiple GPUs for GPGPU purposes.

  2. All Renderers use single precision. A GeForce is faster than a Quadro and a Tesla in single precision.

  3. None of the cards needs to be a Quadro

  4. There are mainboards that support 8 * PCIEx16. You’ll have to watercool them because there are no single slot aircoolings.
    One GTX580 uses around 290W under full load, totals in 2,4 KW.

  5. With 8GTX580 at default clocks you’ll be able to reach theoretical 81581 GFLOPS FMA which is 12.35 TFLOPS.
    A Ci7-2600k has around 80 GFLOPS, or 0.078 TFLOPS, additionally it can only do one floating point operation at a time.
    The GPUs FMA (floatingpoint muladd) can do one floating point operation with an addition in one clockcycle.

IF I’d be so crazy to build a 8*GTX580 rig I’d go with 4 circiuts of watercooling and 5 PSUs, one for the system and the 4 remaining powering 2 cards each.

8520 Euro for the Cards
~350 Euro for the Watercooling
~300 Euro for the mainboard
5*~150 Euro for the PSUs

Totals: ~6610 Euro
And it’s doubtful that 8 Cards will really be 8 time the performance.

If you or anyone else were really serious about doing something like this, I would recommend looking for an external expansion chassis, like those made by Cubix

With something like this, you can size it to the number of cards you want or expect to employ, and only power it up when you need or want to use it.

I think a device like this would make a great open-source or community project - unfortunately I do not have the engineering skills to do it.

EDIT: Also, I think the reason that Quadro are recommended, it because the best of the GTX series take up two slots (I think EVGA might make a single slot watercooled version)

Such a case is practical for many cards but look at the price. You pay for the cards plus the same for the box.

I guess in conclusion we all have the same opinion.
It is possible to build a machine with 4 cards. It is expensive. It is unpractical because you need water cooled cards or a special board with 8 slots. 5 TFLOPS are “easy” to reach but it would be as fast as 4 cards rendering seperatly.

And no matter how many disatvatages it brings we all would be happy to have such a system :smiley:

I’d recommend not to look into it, without going into details, they’re not so good in practice:

I’m curious as to what you are rendering where you need that much ass-end in your box.

O.K., That good 'ol buddy of mine came clean. After work today I gave him a call to let him know about what I had posted here and the responses I received. And right when I was getting to the part of telling him he really doesn’t know anything about it, he tells me that he does and so does ASUS. I said Huh?. . . He then proceeds to tell me that he went to the Asus website to find out some information about my motherboard. The model # is: P6T7 WS SuperComputer and that is where he got the information. Not really believing him, I went to the webiste and looked for myself. This is what they had to say.

" The motherboard will achieve outstanding and dependable performance in the role of a Personal Supercomputer when working in tangent with discrete CUDA technology—providing unprecedented return on investment. Users can count on up to 4 CUDA cards(One of them should be Quadro graphic card) that are plugged into P6T7 WS SuperComputer for intensive parallel computing on tons of data, which delivers nearly 4 teraflops of performance. It is the best choice to work as a personal supercomputer on your desk instead of a computer cluster in a room. "

Now, after I removed my foot from my mouth I immediately tried to find what I did with the box and manual with no luck.

How often do we think we never need the manual or assembly instructions? I probably threw the box away and maybe the manual too.

In any event, that buddy of mine did actually know something about it but tried to take credit for it. If you go to the website and lookup that model you will find all of the details.

The product description is really odd.

You don’t need SLI for multi-GPU CUDA, thus you can use up to 7 cards as the board has 7 PCIe16 slots, but only 4 for SLI as the chipset doesn’t support more.

The other thing is that the slots are x16 slots, but the board doesn’t have enough PCIe lanes to support 8*16x electrical.

It supports 1 + 3 electrical x16 (for the quad-SLI), or 1 * x16 + 6* x8, which nonetheless is astonishing, because most boards offer only 1 electrical x16, regular SLI boards 2* x16.

Quite confusing. However, it doesn’t really matter for CUDA, or not so much, as the calculations are done on-device, meaning the calculations are done on the card and the memory of the card is used and thus the cards memory bus, not the host memory, the ram over the PCIE bus.

So if you run 7 cards, one runs with x16 and six with x8, and the host<>device communication is slower, but the cards themselves will work with their normal speed doing the calculations.

What leaves me dazzled is the “one should be a Quadro” stuff, which to the best of my technical knowledge makes no sense.

Is that meaning that you really don’t need to expand your power supply? AND, what about all that heat?
I dug around again today and still can’t find the manual. I guess to me the math is confusing. IF you have 4 cards ( Minus ) the quadro how can you not achieve 4 * 1.6 TFLOPS = 6.4 TFLOPS? From the description it seems that this is how it should work. It just makes me want to go out and get 3 more gtx580’s so I can experiment.
BUT, then again, just because something works on paper and in “Theory” doesn’t Always mean it is so.

For 4x GTX580 you need, for the cards alone, to be save ~ 1.2 kW under full load. Add the rest of the machines power consumption to it and a 1.5kW PSU might be sufficient.

Uhm? Just download it from the ASUS homepage? The link I posted>downlaod tab>select “other” in the “OS:”-dropdown menu and job done.

I still have no idea what’s it with your Quadro… you don’t need one.
But yes, given the software is written properly and supports multiple GPUs the performance gain is pretty much linear, you’ll always loose performance in the device<>host operations but that’s about it.

A GTX580 has 1581 GFLOPS, or 1.54 TFLOPS if you want, and 4 of them have 4 times the performance.
Again, MultiGPU is not SLI, where you actually never get double the performance with 2 cards.

The question remains what for you’d need so much raw power.

Actually, I really just wanted to confirm or deny my friends thoery. BUT, after this thread and a little thought, I was kind of interested in trying to run a chess program. Ya know it is really time consuming to play a game of chess against a computer if you have the level turn up. Why? because the program will consider every move possible before making its move. Did you know that there are about 1 OCTILLION different move cobinations. That is a 1 followed by 27 0’s. So, with a system like that, you might be able to play a game in less than an hour.

P.S. I am unable to dowload the manual because I am in BFE and can only get a dial up connection with a rate of about 16.8k.