BIGGEST BLENDER CYCLES GRAPHICS CARD FAQ's

Render first on the CPU and find out. Look in task manager if your windows for the peak RAM value during render time and you have your answer

The discussion in this thread is about GPU rendering and graphiccards, not CPU rendering and RAM on motherboards :wink:

Kind regards
Alain

Thanks for your answer, but I really meant VRAM. I think I will buy an ASUS GTX 670 (It´s not only for Cycles, and the card is - in contrast to an 580 - really silent(42DB under full load)).
There are the two variants:
2GB: http://lb.hardwareversand.de/articledetail.jsp?aid=59187&agid=1156&pvid=4nehx71ff_h9skde2y&ref=13
4GB: http://lb.hardwareversand.de/articledetail.jsp?aid=69181&agid=1947&pvid=4nnohjcn9_h9skde2y&ref=13
The difference between the prices is 55€ (which is ~70$).

The scene I´m working with at the moment requires 1GB RAM, and it´s not so big. But I didn´t knew if there is a difference in the required amount of RAM between CPU and GPU because they´re very different devices.

Should I get 4GB, just because it isn´t that big difference in the price?
I really don´t want to run out of RAM if I could get 4GB for a bit more money.

@Dacit,
If you plan on overclocking you should know that less ram overclocks better ( in gaming anyway ). If not then go with the 4 GB. Why not get 580 3Gb ? don’t see one for sale ?

Personally, I don´t like overclocking. My components should have a long lifetime, as I don´t want to buy new ones in the next years. Only if i took an 2GB variant, I would buy one with manufactural OC. I know that the 580 is a bit better in Cycles, but it´s just too loud. I don´t know any 580 which is under 55DB - compared with the 42DB of an OCed ASUS 670 (The not-OCed should be even quieter). Also the 580s are in the same price level as the 670 and with their much higher power comsumption (+200W more * 3hours/day = +$70/year) they are even more expensive.

@reC: It seems I was a bit wrong in RAM usage. The 1GB is the RAM used by blender. But the amount of additional RAM used while rendering is only ~200MB. So will my graphics card need 1GB or 200MB to render the scene?
Thank you in advance for your answer!

Hard to say, I guess it depends on the Scenegeometry and the Texturefiles you use.
I have a Scene which needs 300 MB in Blender Viewport, while my VRAM is filled with 150MB of Data.
When I start to render, the datavolume in VRAM goes up to 900 MB.

Kind regards
Alain

Kind regards
Alain

I bought the 4GB variant now and i´m totally happy. Thanks to all!

Some good info here, the mods should make a hardware forum and sticky this

Hello, I’m looking for a Mac compatible GPU. Sorry if this seems to be an age old and repetitive question, but all the answers online are relevant to adobe premiere, photoshop, etc and anything on Mac has to be hacked. I am a Mac user, and I’d like to get a GTX 580. I really think that is seems the best for my budget, but I was wondering if there is ANY possible way for me to buy this card and not worry about hacks and editing hex and such. I really am tired of searching endlessly or shying away from $1000 prices of the Quadro cards.

I read online that you only need to install the latest Nvidia drivers, have the right OS, and it should be a plug and play. Is this true? If not, are there any cards not exceeding $400, that work with Cycles/Mac OS, and have decent performance? I am really sorry to be asking this question, but I can’t find the right card!

Thanks in advance to any and all who answer this! It is greatly appreciated!!!

Hello,

I’ve red through the posts and still have a doubt:

If I want to render a single frame, what differences of performamce I might get from these setups:
a) single gtx card
b) two gtx cards (one in each board slot)
c) two gtx cards SLI connected

thanks

Actually, I’m pretty sure this is wrong. I did tests this morning with a setup that has two identical GTX-580 cards, one sitting in a x16 slot and another sitting in a x1 slot. The difference was pretty much zero on the renders i tried. To me this makes sense - the reason you want x16 is not for rendering, it is for gaming. The advantage of a faster bus is that you can shuffle textures in and out of the GPU much faster, but when you’re rendering, the textures are all copied to the card before you start, and are not changed until you render the next frame (if you are rendering an animation). This means that the bus speed is mostly irrelevant for rending. It only impacts the time Blender spends setting up the work at the GPU. If you render a lot of frames with short render times (less than a minute for example, with 1000+ frames) it will matter, but for most people it won’t matter at all. If you have a motherboard where you can set the PICe speeds like on mine, you can try for yourself.

I’d sum up by saying that my impression is (please do correct me if I’m wrong here, I might have missed something):

SLI : Great for gaming (if your drivers are good), useless (no advantage, possible problems only) for rendering
PCIe bus speeds: Irrelevant in almost all rendering scenarios

Conclusion: add as many different GPUs as you like, saturate the PCIe bus as much as you like, just watch the VRAM - the GPU with the least amount of RAM will be your bottleneck for scene size/complexity.

I confirm your impressions as facts :wink:


I already said that in various threads.
Working with CUDA you got a host (CPU+RAM) and a device (GPU+VRAM)

If you start rendering, the CPU builds the acceleration structure and then sends the Geometric and Texture data with the fragment programs to the device.
host->device operation over the PCIe.

The rendering only happens on the device. The PCIe is not used during this process. The memory bandwidth on the device is around 192GB/s for a GTX580. The default PCI system bus is around 133MB/s, and the original PCIe 1.0 specification is 250MB/s per lane.

PCIe x16 1.0 = max. 3.9 GB/s (250*16)
PCIe x16 2.0 = max. 7.8 GB/s
PCIe x16 3.0 = max. 15.6 GB/s
GTX580 on device = max. 192GB/s

And that’s why the calculations (in our case rendering) is done on-device.
Not only is the PCIe bus ridiculously slow compared to the memory bandwidth on-device, it also has to synch with the system bus for each read and write command from the CPU which causes additional latency and slows the average transfer speed further down.

This is also the reason why 2 Cards don’t double the amount of memory for rendering - they could but it’s very undesireable.

If you’d fill 50% of the scene data in each card, you’d always have to access the data between the two devices via the hosts bus system slowing it down by a factor of at least 100.
So you have to copy the full data to each of the cards, and then let each of the card calculate 50% of the image.

Once rendering is done, you have to access the stored image in the devices memory again and copy it back to the device.

If you need to copy 3GB of data from the host to the device, namely copy your scene to the video memory for rendering, you need a theoretical 0.76 seconds to copy it with PCIe 2.0 x16, and 1.53s to copy it with PCIe 2.0 x8.
And that’s one time before one frame… neglectable thinking about 10min rendertime per frame.

A full HD frame with RGBA has 192010804 = 8.294.400 bytes = 7.91 MB.
Takes PCIe 2.0 x16 0.00198s to fetch the data, PCIe 2.0 x8 0.00396s to fetch the data… again, once only after the frame.

So if our frame took 10 mins to render, you have:
PCIe 2.0 x16: 10 min 0.760198 sec rendertime.
PCIe 2.0 x8: 10 min 1.53396 sec rendertime.

Granted, if you render 1.000.000 frames you save 214h… if each frame takes 10 minutes though you save those 214h off 19.02 years.
:wink:

Hope this sheds some light on expensive mainboards. They’re toys to play so to say.

Thanks a lot for that Arexma, then my theory proved right. I was toying with the idea of measuring the PCIe during rendering using CUDA tools just to be 100% sure, but your reasoning & math seems bulletproof to me, so that would then be redundant.

Oh, and that confirms that I haven’t made a mistake by bidding for that used GTX-590 that I want to put in next to my two GTX-580s. It will effectively give me 4x580 (with all the caveats of 2GPU != 2performance), and the fact that I am then reduced to PCIex1 on all three PCIe slots doesn’t matter the slightest, as I have a gaming PC with a GTX-680 for that :slight_smile:

It would be nice if the original poster could update the front page. I think I will also make a single page and put it on the web somewhere (with proper references and credit of course) and post the link here. It’s nice to have it here on BlenderArtist, but I think a separate nicely formatted page wouldn’t be so bad either.

I would be thankful for this.
It would be also cool if you could post photos of your hardware, because for a nontechfreak like I am it’s hard to imagine the physical dimensions of the hardware you are talking about. I’m always asking myself: Will this fit into my Computer ? Is there enough space for all this graphiccards ? and so on… :slight_smile:

Anyway, thanks for you research !

Kind regards
Alain

:smiley: That was actually my biggest concern as well, next after cooling, which is already a problem with 2x580s. The solution, for me at least, will be water cooling (ordered but not arrived yet). Power from your PSU also starts becoming a factor, so there are quite a few things to consider.

I’ll give it a shot this afternoon and see if I can’t make something helpful.

You could also get a PCIe extension cable. This way you can put your cards “somewhere” and still cool on air.
Seen this many times already, the mainboard in a custom frame, wood or metal, the extension cables and the cards floating in midair over the board spread apart in an airstream in the direction of the cards.

A 20 cm cable costs ~6 USD

For 3 cards thats 20 bucks with shipping and a few hours “home improvement”
A lot cheaper than 3 WC-plates in the 100 bucks each, a tank, a pump, the liquid, the cables, small parts, radiator and fans.

A decent watercooling for 3 cards would be around 600 USD+

You’re right - that would indeed be the sensible thing to do - initially I thought custom made plexiglass. I was even considering doing the “Google solution”, i.e. just put them on a plastic surface on the desk, but then there is the wife… :wink:

Thus, I decided to be rather silly and not-so-sensible and instead splash out for a water cooling that is big enough to run silent(ish) - at least silent enough for me to spend extended periods of time in the same room as it. Currently I have tweaked the two 580s’ fans precisely - one runs at 76% load and the other at 84% load, and it sounds like I’m sitting in a heliport, so action was needed.

This setup is still in the mail though, so for now, I use big ear-covering headphones in there :slight_smile:

I couldn’t get sli thing, do I need special power supply for sli? will cycles work without it? I need 2 gtx650 with 635w powersupply, will that work? SLI is must or there is another option?

There’s no need to post and PM me about the same stuff. I usually check the forums multiple times a day and daily. Also urging the matter doesn’t help, I post when I got time… :wink:

First of all. You don’t need SLI.
For the umpteenths time in this thread too:

MultiGPU Mode != SLI.

SLI is Scalable Link Interface, as opposed to 3dfx’s (which nvidia bought) old ScanLine-Interleave.
In SLI, you got a master-slave relation between the cards and they have to have an identical GPU.
SLI is meant for split realtime framebuffer calculation and then display.
There are two rendermodes for the 3d pipeline, which has “nothing” [1] to do with the calculation and rendering of a 3D CG Scene in this context.
One rendermode is Split Frame Rendering, the other is Alternate Frame Rendering. For prior, each card renders 50% of a frame. The frame is analyzed and the workload is split, not the image. The latter one, as the name implies, one card renders even frames, the other one odd frames.
However, once a frame is finished, the slave card(s) send the framebuffer to the master card, which displays it. That’s also why in SLI you have to attach the SLI bridge to your card(s), this bridge is used to transfer the finished framebuffer from the slave to the master.
But SLI aside, you don’t need it it’s for gaming only, for realtime display via the “3d pipeline”, be it Direct3D or OpenGL.

In MultiGPU CUDA mode, all cards are equal and don’t require to be similar. (Notice: you’ll only have the amount of VRAM of the smallest card available)
Once you hit render and have 2 cards in your machine, the whole scene data is sent to both cards.
Then both cards start to solve the render equation for their portion of the scene.
Alternatively you could start 2 instances of Blender, assign one card to each Blender instance, and let one render all even frames, the other one all odd frames.
Once the card(s) finished rendering, the resulting pixel values are returned, and you got your image.

[1] For the sake of the completeness, the 3d pipeline for realtime is pretty much the same as for rendering CG images, however it takes a lot of shortcuts and uses some tricks to handle it in realtime.
But you clearly have to differentiate it in this context.

That said, enabling SLI will only cause troubles using Blender/OpenGL (been there), just keep it disabled, Blender will find your 2 cards and use them.

As for your choice… meh, don’t do it.

The GTX650 only has 1GB VRAM, and that’s all you’re able to use for rendering, no matter how many cards you put in your machine. If anything you’d have to get a 2GB version.

It costs ~120 Euro/150 Euro (1GB/2GB), and the 600 series ain’t that fast for Cycles.
And for viewport it makes Zero sense to use 2 cards in SLI, so I assume that’s what you want them for.

A good indicator for the CUDA speed is the fused multiply add value of the cards, which is 812 GFLOPS for the 650.
So with 2 cards, you should reach a theoretical 1624 GFLOPS.

You’ll reach a similar theoretical speed, and a higher effective speed with a used GTX580, which you could get around 200-250 Euro with some luck, with 3GB of memory.