Nvidia GeForce GTX680 released...

see Farmfield… returning my 560 2GB the other day was a good move… :wink:
now will wait for the 670’s or 660’s to come out… :slight_smile:

I was about to say the same thing, plus I think you actually only need about 1GB to max out Crysis. Rage IIRC is actually supposed to be more efficient with memory. Cards have been stuck on 2GB for 5 or 6 years now, as long as games remain cross platform the will be stuck on that for a lot longer.

http://uk.geforce.com/hardware/desktop-gpus/geforce-gtx-680/specifications

… I wonder what that feels like in Cycles … :eyebrowlift:

Merged AndyPaul’s 1-post thread in Latest News with this one.

Depends on what money your willing to spend - $500 is 2.5x the price of the card you returned, right? ;D

…and I can’t even get that cheap here in Sweden, $700 at the cheapest… A GTX560Ti is $200…

My heart missed a beat when I saw the 1536 CUDA cores GTX680 appear on PC Specialist’s GFX options list - I’m just about to buy a new PC! Then I found this review: http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/17

They explain elsewhere why 1536 CUDA cores on a Kepler card doesn’t equal 3 x 512 CUDA cores on a GTX5xx card. So do I stick with a 3GB GTX580, or wait for someone to do a Cycles benchmark on a GTX680?

[NB: I don’t know why nVidia make such a fuss about the Quadro cards: I was forced to have a £700 Quadro 4000 in my new PC at work, and it’s crap for Cycles compared with a £400 GTX580!)

These numbers don’t give a good picture of compute performance. If you look at these numbers you can see that in some cases the 680 is twice as fast as the 580. So far I haven’t seen any CUDA Rendering benchmarks, but I have no doubt the 680 will significantly outperform the 580 there, too. As far the OpenCL performance, that’s probably just the crappy driver for now.

I don’t know why nVidia make such a fuss about the Quadro cards: I was forced to have a £700 Quadro 4000 in my new PC at work, and it’s crap for Cycles compared with a £400 GTX580!

The equivalent (as far as raw compute power is concerned) to a GTX580 would be the Quadro 6000, which costs over 3000€. Professional users pay professional prices :wink:

Finded CUDA benchmark with SiSoft Sandra
http://www.brightsideofnews.com/news/2012/3/22/nvidia-gtx-680-reviewed-a-new-hope.aspx?pageid=4

single precision performance
GTX 680 with 2,702 MPix/s
dual Fermi GPU GTX 590 with 2,610 MPix’s

When it comes to the Double Shaders test the GTX 590 beats the GTX 680 by putting up 471.35 MPix/s to the GTX 680’s 204 MPix/s.

What we need for Cycles?
single precision or double Shaders?

What we need for Cycles?
single precision or double Shaders?

Single Precision. But I wouldn’t infer doubled performance from this test, it seems rather simplistic.
Waiting patiently for someone to do some Cycles or Octane tests for now…

Hi, i was looking for some CUDA benchmark but instead, I found this:

“One of the cooler new features from an actual application perspective is bindless textures. Prior to Kepler, Nvidia GPUs were limited to 128 simultaneous textures; Kepler boosts that by allowing textures to be allocated as needed within the shader program, with up to 1 million simultaneous textures available. It’s doubtful whether games will use that many textures, but certain types of architectural rendering might benefit.”

As far as I heard that right now we can manage 100 texture (95+5 32bit), does it mean we will be capable of such texture limits? (taking into account the available memory of course)

source: http://www.maximumpc.com/article/features/kepler_unveiled_nvidias_gtx_680_benchmarked_-depth

That’s great news!

Yeah mate! Hope it will be confirmed!

I think this is a restriction of Blender’s infrastructure and has nothing to do with the GPU…

I don’t think so, i posted it because I remember the GPU texture limit set to 128, Cycles takes about 28 slots for other purpose, and the remaining 100 for textures, here it is a quote from blog “More Cycles”:

“An obvious answer is to only use float textures where it is needed. This approach is difficult because the texture slots for GPUs must be known beforehand, and on CUDA there is a 128 texture limit. Cycles uses the last 28 for internal purposes, so there are 100 to work with. Siphoning any portion of those off to become float textures impacts that aspect of the budget

source: That’s the blog of “our” MikeFarny
http://morecycles.blogspot.it/2012/01/hdr-texture-sampling-pt-1.html

What is the texture limit with OpenCL, by the way?

No, it is a GPU limitation. Cycles uses hardware textures which allows to use hardware texture interpolation and is (I believe) somewhat faster, but those are limited, eg. in OpenCL you can query CL_DEVICE_MAX_READ_IMAGE_ARGS (by spec it is at least 128). The alternative would be to do the memory management and interpolation manually, like SLG does.
For a normal (OpenGL/D3D) renderer you only ever need so many texture per draw call and you bind them to the texture units by a function call. NVIDIA has had for quite a while an extenstion to OpenGL/D3D to use bindless textures to get rid of that binding overhead, I’m not sure how this has changed with Kepler or whether it is now possible to have a larger amount of hardware textures in OpenCL or CUDA.

@Zalamander Since you seem quite more expert than me at this, can you explain the advantages of this hypothetical “millions of textures” implementation? I mean, when you’re GPU rendering, how the system could handle the texture size memory limit? Does the system refer to the GPU vram, or my “standard” RAM memory?

I’m a bit confused about that.

You can read more about the whole bindless thing here. In classic rasterization you don’t need all the data for the whole scene in GPU ram at the same time (although in practice you want to for performance reasons) and the elements are drawn one by one, and the textures required for that draw call are bound to the texture units. This needs to be done by the CPU and it locks your thread, too, so for a complex scene it can become a bottleneck - that’s the whole point of bindless rendering.
As for GPU raytracing, you always need all the data available in gpu ram all the time because everything could at any point interact with anything else. One could page the data in and out but that can be really bad for performance but also quite a lot of work to implement. CUDA (or OpenCL) doesn’t really have something like that built-in, but for example the Optix framework has recently implemented something like that (at least for textures).

@Zalamander Oh, thanks, i’m starting to understand.

for cuda its the ram on the card, so 2 gigs. if you split 2 gigs, minus what every blender, cycles, your os and display uses…into 1 million different peices you are going to be looking a kilo not even meg sized textures, extreamly low rez. and why would you need 1 million textures? if you have 1 million objects in your sceane many of them are probably going to be copies of each other and can share the same textures. other objects may be multiple texture but 1 million textures would be over kill big time. i dont think the human eye can even see 1 million different colors so even if each color you can detect were its own texture you still wouldn’t need a million let alone millions.

lets assume the average person lives 80 years. 80 X 365 = 29200 days, = 20 leap years is 29240. X24 is 701760 hours you live. if you were to model and texture 1 item per hout from the womb to the grave you would still run out of time, without eating, sleeping, bathing, etc…and can you imagine putting that many objects in a single scene? i think it would crash any computer and most networks…let alone a single graphics card.

1080p the standard for hdtv is just a bit over 2 million pixils. unless you are making still prints that means you would have an average of two pixils per texture on the best tvs on 1 million textures. so millions of textures would be 1 texture per pixil. thats not a texture thats a color, and many of the pixils on the screan would be the same color.