I quickly did the numbers as I am full of caffeine right now

For the non believers, you can just calculate the perfomrance with simple math, or look up the GFLOP FMA SP performance:

First some basic conditions:

Actually the double precision performance always is at 1/2th the performance of single precision.

The GeForce has it’s SP to DP capped to 8:1 instead of this regular 2:1.

A GTX580 has 512 “CUDA cores” running with 1544 MHz, 3GB VRAM and costs around 400 Euro

A compareable Tesla C2050 has 448 “CUDA cores” running with 1150MHz, 3GB VRAM and costs around 1800 Euro

So each Streaming Multiprocessor of a GF110 contains 32SPs (Shader Processor or commonly Unified Shader) and 4SFUs (Special function untis)- for the GF114/116/118 its 48SPs and 8SFUs.

Each SP can do two SP FMA (Fused Multiply-Add) operations per clockcycle, a SFU can do up to four operations SF per clockcycle.

So the theoretical guesstimation according to nvidia is FLOPS FMA(SP) = Shaderfrequency[GHz]*shadercount*2

–

Now the math:

GTX580 = 1.544*512*2 = 1581.056 GFLOPS FMA

C2050 = 1.150*448*2 = 1030.400 GFlops FMA

The numbers nvidia specifies for the cards are:

The GTX580 has a 1581 GFLOPS FMA single precision performance.

The C2050 has a 1030 GFLOPS FMA single precision performance.

That single precision floating point perfomance is what Cycles, Octane or Luxrenders OpenCL uses.

Not hard to see that the Tesla only performs at 2/3 of the GeForce’s speed.

It’s different with double precision though.

As widely known and confirmed by nvidia the GeForce’s FMA(DP) is capped at 1/8th of the FMA(SP) while it’s the possible 1/2th for Teslas.

The C2050 has a 515 GFLOPS FMA double precision performance.

The GTX580 has a 192.6 GFLOPS FMA double precision performance.

So here the Tesla is more than twice as fast as the GeForce and on top of that made to work 24/7 in tight hot places with EEC memory doing it’s thing.

That’s something no renderer needs - only medical, (astro)physical or chemical calculations and generally scientific applications.

So recommending to buy a Tesla for rendering is saying to buy a card that costs 4,5 times more than a GeForce, but only performs at 2/3rd of it’s performance.

Studios like ILM don’t care though, they simply buy 10 TeslaBlades S2050 for instance, it offers 4221 GFLOPS per blade and allows the usage of a total of 12GB VRAM per blade (Teslas unified addressing) costing a total of ~150.000 Euro.

They’d render 24/7 with the power of almost 27*GTX580 (while having 40*C2050 in them)

So the most economic SP CUDA solution still is the GeForce.

And to conclude it, a Quadro 5000 for instance has the same 1/8th cap as the GeForce.

It has 352 “CUDA Cores” and runs with 1020MHz offering 3GB VRAM as well for 1700 Euro.

Thus it’s SP performance being ~718 GFLOPS FMA SP.

So while it’s the slowest card for CUDA compared to buying a Tesla, it has the superior OpenGL hardware operations a Tesla doesn’t have.

Tesla: Slow SP, Fastest DP, no special OpenGL

Quadro: Slowest SP, slowest DP, special OpenGL

GeForce: Fastest SP, slow DP, no special OpenGL

What’s missing?

Exactly a card that’s averagely good in all disciplines for endusers that can’t shell out money in the 5 digits.