Nvidia GeForce GTX680 released...

marcoG_ita · March 22, 2012, 9:27am

No no, i was only disappointed because Nvidia cap intentionally the performances to get you buy Quadro instead of Geforce, beside of gaming, i can’t be happy for that.

Anyways, as I mentioned, i’m not a tech-geek and i was looking for clarification about new cards and GPU rendering.

DingTo · March 22, 2012, 9:33am

Cycles will always run on the CPU. So no, it won’t become useless.

m9105826 · March 22, 2012, 9:42am

True, if Cycles can get some of the speed I’ve seen in Arnold, people may just forget about GPU rendering haha

marcoG_ita · March 22, 2012, 9:45am

+1…but for now CUDA is a big help

Farmfield · March 22, 2012, 9:46am

Hehe, that’s true, not useless. But the work put into CUDA/OpenCL acceleration will be in vain if VRAM becomes the final bottleneck - which I for one do not think. CUDA allows for using machine RAM and lets just assume they’ll solve it. Eventually. 2016 at latest. ;D

I’m trying to be positive here folks…

(on the other hand, you’ll get a lot av CPU-hours on Amazon EC2 for $500… :D)

san · March 22, 2012, 9:48am

Wasn’t Kepler supposed to have virtual memory support? As i understood it, it would then be possible to have GPU and CPU share all available RAM (incl. system memory). Or have I completely missed the point?

Edit: Ok, looks like Farmfield answered it. So CUDA allows it but Cycles isn’t set up to use it? Because it needs to be compatible with OpenCL?

m9105826 · March 22, 2012, 10:00am

It’s possible, but it’s painfully slow to swap memory between system RAM and vRAM. The bandwidth simply isn’t there. It’s actually slower in most cases to try to swap than it is to simply use the CPU. Trying to use normal memory compared to GDDR5 memory is akin to trying to make a mule be compatible with a Bugatti. Yeah, you can load a lot more onto the mule, but to swap things between them you’d have to have the Bugatti slow down to the point where it wouldn’t even be useful to have it around anymore.

Zalamander · March 22, 2012, 10:03am

The page says:

The more complete story is that it doesn’t want to go there…yet. Sandra 2012 just showed us that the GeForce GTX 680 trails AMD’s Radeon HD 7900 cards in 32-bit math. And it gets absolutely decimated in 64-bit floating-point operations, as Nvidia purposely protects its profitable professional graphics business by artificially capping perfrmance.

Clearly they are referring to the SiSoft Sandra double-precision benchmark on the previous page. I’m not sure if they are under the impression that Luxmark uses DP because of the bad results. In any case Luxmark does not use DP, neither does Cycles or any other GPU renderer so the capping of the DP performance on Geforce cards need not be a concern for people interested in GPU rendering.

marcoG_ita · March 22, 2012, 10:08am

Ok well, sounds better now, thanks

Farmfield · March 22, 2012, 10:08am

I don’t have a clue why it’s not implemented, possibly due to the issues clarified by m9105826 - but I read about this in regard to CUDA, I don’t know how OpenCL uses VRAM/RAM, if it’s the same deal there…

Ouch, so what you’re saying is that it’s possible but not practical. There goes my dream scenario where Cycles can use CPU+GPY, VRAM+RAM… Well, that’s a kick in the b*lls…

san · March 22, 2012, 10:13am

Thanks for the explanation m9105826. I thought it meant that the GPU could address the system memory and use that directly, but if it has to be swapped to VRAM before use through the PCIe bus (even 3.0), I guess that is not the best…

m9105826 · March 22, 2012, 10:16am

Yeah, it’s possible. There are hybrid renderers out there, but they barely leverage any of the horsepower of the GPU. In a raytracer, part of the calculation time comes from calculating first intersection, or the spot where a given ray first intersects the geometry in a kdtree. Most (actually, all of them that I know of) hybrid renderers use the GPU just to calculate ray intersections and then pass that information on to the CPU for actual rendering. This gives a slight speed boost, but it’s all very experimental right now. I’ve seen a couple of renderers try to dynamically load and unload geometry and texture data in thesis papers, including one of a close friend, and even the best of what I’ve seen has been severely throttled by bus speed and data caching limitations.

m9105826 · March 22, 2012, 10:19am

I’d love to be wrong and see GPUs be able to directly access system RAM, but even then the bottleneck would be the speed of the RAM, because a GPU would be able to chug through the information faster than it could be fed to it.

Farmfield · March 22, 2012, 10:49am

For me the question is, how far can I manage on 2Gb… To be honest I haven’t experimented much with big scenes in Blender/Cycles, I still do my day-to-day work in Maya/Vray, but the idea is to switch over in the fall so I need to do at least one proj. in Blender/Cycles and see what I can manage o 1Gb, 2Gb a.s.o…

Kemmler · March 22, 2012, 4:52pm

Isn’t rendering in Cycles more or less additive, i.e. the passes gradually average together to create a smooth image? (I am not sure if I understand how that works), but, wouldn’t it be possible to simply start the processes in parallel on the CPU and GPU using different seeds and then combine them in 2D? (Again, I have a feeling that I am seriously mistaken about something here)

Farmfield · March 22, 2012, 5:18pm

I don’t have a clue actually. m9105826 seems to, though… I kinda get it as, splitting the render cause more trouble than we gain in speed - at least for the moment. Will be interesting to see if (and how) this might change during Mango. To bad we lost a Cycles dev. today though…

m9105826 · March 22, 2012, 5:46pm

You certainly can. Back when i did my first test animation with Cycles I had two instances open of Blender with one set to GPU and one to CPU. However, afterwards I saw some tests that others here did showing that there were still some bottlenecks in their systems causing a bit of time lost using my method, so I’m not absolutely sure if this is a viable solution. As far as starting with different seeds and combining in 2D, as far as I know it’s possible, but I do recall a post from brecht a while back about how the result still wouldn’t be as good as one straight render. I’m not sure exactly the reason, I’ll see if I can find the post. The problem I see with making this approach automated is that even with different seeds, you’ll end up calculating the same pixel twice more often than not, and there is the off chance that, depending on how many passes you set each device to, you could still have unresolved pixels at the end.

storm_st · March 22, 2012, 6:29pm

About seeds, it may work, but i am not sure. The problem is high-end mathematics behind that low discrepancy random generator. It designed for single run only, think like equal step scanning in volume but in dimension > 3, and if you change seed there is risk you just repeat same samples just in other order, making whole idea useless. I know, it not that directly connected, but using Cranly-Patterson rotation, but it still need checking.

Kemmler · March 22, 2012, 6:47pm

It sounds like it might be more trouble than it’s worth, but if you can avoid enough sample collisions, someone with a good CPU, lots of RAM, good mobo, and good GPU might save some time. Dunno, I really understand very little of the “under the hood” stuff.

Farmfield · March 22, 2012, 7:06pm

I don’t really know the under the hood stuff either, I just want good, fast renders.