Nvidia GeForce GTX680 released...

No no, i was only disappointed because Nvidia cap intentionally the performances to get you buy Quadro instead of Geforce, beside of gaming, i can’t be happy for that.

Anyways, as I mentioned, i’m not a tech-geek and i was looking for clarification about new cards and GPU rendering.

Cycles will always run on the CPU. :wink: So no, it won’t become useless.

True, if Cycles can get some of the speed I’ve seen in Arnold, people may just forget about GPU rendering haha

+1…but for now CUDA is a big help :slight_smile:

Hehe, that’s true, not useless. But the work put into CUDA/OpenCL acceleration will be in vain if VRAM becomes the final bottleneck - which I for one do not think. CUDA allows for using machine RAM and lets just assume they’ll solve it. Eventually. 2016 at latest. ;D

I’m trying to be positive here folks… :smiley:

(on the other hand, you’ll get a lot av CPU-hours on Amazon EC2 for $500… :D)

Wasn’t Kepler supposed to have virtual memory support? As i understood it, it would then be possible to have GPU and CPU share all available RAM (incl. system memory). Or have I completely missed the point?

Edit: Ok, looks like Farmfield answered it. So CUDA allows it but Cycles isn’t set up to use it? Because it needs to be compatible with OpenCL?

It’s possible, but it’s painfully slow to swap memory between system RAM and vRAM. The bandwidth simply isn’t there. It’s actually slower in most cases to try to swap than it is to simply use the CPU. Trying to use normal memory compared to GDDR5 memory is akin to trying to make a mule be compatible with a Bugatti. Yeah, you can load a lot more onto the mule, but to swap things between them you’d have to have the Bugatti slow down to the point where it wouldn’t even be useful to have it around anymore.

The page says:

The more complete story is that it doesn’t want to go there…yet. Sandra 2012 just showed us that the GeForce GTX 680 trails AMD’s Radeon HD 7900 cards in 32-bit math. And it gets absolutely decimated in 64-bit floating-point operations, as Nvidia purposely protects its profitable professional graphics business by artificially capping perfrmance.

Clearly they are referring to the SiSoft Sandra double-precision benchmark on the previous page. I’m not sure if they are under the impression that Luxmark uses DP because of the bad results. In any case Luxmark does not use DP, neither does Cycles or any other GPU renderer so the capping of the DP performance on Geforce cards need not be a concern for people interested in GPU rendering.

Ok well, sounds better now, thanks

I don’t have a clue why it’s not implemented, possibly due to the issues clarified by m9105826 - but I read about this in regard to CUDA, I don’t know how OpenCL uses VRAM/RAM, if it’s the same deal there…

Ouch, so what you’re saying is that it’s possible but not practical. There goes my dream scenario where Cycles can use CPU+GPY, VRAM+RAM… Well, that’s a kick in the b*lls… :stuck_out_tongue:

Thanks for the explanation m9105826. I thought it meant that the GPU could address the system memory and use that directly, but if it has to be swapped to VRAM before use through the PCIe bus (even 3.0), I guess that is not the best…

Yeah, it’s possible. There are hybrid renderers out there, but they barely leverage any of the horsepower of the GPU. In a raytracer, part of the calculation time comes from calculating first intersection, or the spot where a given ray first intersects the geometry in a kdtree. Most (actually, all of them that I know of) hybrid renderers use the GPU just to calculate ray intersections and then pass that information on to the CPU for actual rendering. This gives a slight speed boost, but it’s all very experimental right now. I’ve seen a couple of renderers try to dynamically load and unload geometry and texture data in thesis papers, including one of a close friend, and even the best of what I’ve seen has been severely throttled by bus speed and data caching limitations.

I’d love to be wrong and see GPUs be able to directly access system RAM, but even then the bottleneck would be the speed of the RAM, because a GPU would be able to chug through the information faster than it could be fed to it.

For me the question is, how far can I manage on 2Gb… To be honest I haven’t experimented much with big scenes in Blender/Cycles, I still do my day-to-day work in Maya/Vray, but the idea is to switch over in the fall so I need to do at least one proj. in Blender/Cycles and see what I can manage o 1Gb, 2Gb a.s.o…

Isn’t rendering in Cycles more or less additive, i.e. the passes gradually average together to create a smooth image? (I am not sure if I understand how that works), but, wouldn’t it be possible to simply start the processes in parallel on the CPU and GPU using different seeds and then combine them in 2D? (Again, I have a feeling that I am seriously mistaken about something here)

I don’t have a clue actually. m9105826 seems to, though… I kinda get it as, splitting the render cause more trouble than we gain in speed - at least for the moment. Will be interesting to see if (and how) this might change during Mango. To bad we lost a Cycles dev. today though… :stuck_out_tongue:

You certainly can. Back when i did my first test animation with Cycles I had two instances open of Blender with one set to GPU and one to CPU. However, afterwards I saw some tests that others here did showing that there were still some bottlenecks in their systems causing a bit of time lost using my method, so I’m not absolutely sure if this is a viable solution. As far as starting with different seeds and combining in 2D, as far as I know it’s possible, but I do recall a post from brecht a while back about how the result still wouldn’t be as good as one straight render. I’m not sure exactly the reason, I’ll see if I can find the post. The problem I see with making this approach automated is that even with different seeds, you’ll end up calculating the same pixel twice more often than not, and there is the off chance that, depending on how many passes you set each device to, you could still have unresolved pixels at the end.

About seeds, it may work, but i am not sure. The problem is high-end mathematics behind that low discrepancy random generator. It designed for single run only, think like equal step scanning in volume but in dimension > 3, and if you change seed there is risk you just repeat same samples just in other order, making whole idea useless. I know, it not that directly connected, but using Cranly-Patterson rotation, but it still need checking.

It sounds like it might be more trouble than it’s worth, but if you can avoid enough sample collisions, someone with a good CPU, lots of RAM, good mobo, and good GPU might save some time. Dunno, I really understand very little of the “under the hood” stuff.

I don’t really know the under the hood stuff either, I just want good, fast renders. :smiley: