I quickly did the numbers as I am full of caffeine right now
For the non believers, you can just calculate the perfomrance with simple math, or look up the GFLOP FMA SP performance:
First some basic conditions:
Actually the double precision performance always is at 1/2th the performance of single precision.
The GeForce has it’s SP to DP capped to 8:1 instead of this regular 2:1.
A GTX580 has 512 “CUDA cores” running with 1544 MHz, 3GB VRAM and costs around 400 Euro
A compareable Tesla C2050 has 448 “CUDA cores” running with 1150MHz, 3GB VRAM and costs around 1800 Euro
So each Streaming Multiprocessor of a GF110 contains 32SPs (Shader Processor or commonly Unified Shader) and 4SFUs (Special function untis)- for the GF114/116/118 its 48SPs and 8SFUs.
Each SP can do two SP FMA (Fused Multiply-Add) operations per clockcycle, a SFU can do up to four operations SF per clockcycle.
So the theoretical guesstimation according to nvidia is FLOPS FMA(SP) = Shaderfrequency[GHz]shadercount2
Now the math:
GTX580 = 1.5445122 = 1581.056 GFLOPS FMA
C2050 = 1.1504482 = 1030.400 GFlops FMA
The numbers nvidia specifies for the cards are:
The GTX580 has a 1581 GFLOPS FMA single precision performance.
The C2050 has a 1030 GFLOPS FMA single precision performance.
That single precision floating point perfomance is what Cycles, Octane or Luxrenders OpenCL uses.
Not hard to see that the Tesla only performs at 2/3 of the GeForce’s speed.
It’s different with double precision though.
As widely known and confirmed by nvidia the GeForce’s FMA(DP) is capped at 1/8th of the FMA(SP) while it’s the possible 1/2th for Teslas.
The C2050 has a 515 GFLOPS FMA double precision performance.
The GTX580 has a 192.6 GFLOPS FMA double precision performance.
So here the Tesla is more than twice as fast as the GeForce and on top of that made to work 24/7 in tight hot places with EEC memory doing it’s thing.
That’s something no renderer needs - only medical, (astro)physical or chemical calculations and generally scientific applications.
So recommending to buy a Tesla for rendering is saying to buy a card that costs 4,5 times more than a GeForce, but only performs at 2/3rd of it’s performance.
Studios like ILM don’t care though, they simply buy 10 TeslaBlades S2050 for instance, it offers 4221 GFLOPS per blade and allows the usage of a total of 12GB VRAM per blade (Teslas unified addressing) costing a total of ~150.000 Euro.
They’d render 24/7 with the power of almost 27GTX580 (while having 40C2050 in them)
So the most economic SP CUDA solution still is the GeForce.
And to conclude it, a Quadro 5000 for instance has the same 1/8th cap as the GeForce.
It has 352 “CUDA Cores” and runs with 1020MHz offering 3GB VRAM as well for 1700 Euro.
Thus it’s SP performance being ~718 GFLOPS FMA SP.
So while it’s the slowest card for CUDA compared to buying a Tesla, it has the superior OpenGL hardware operations a Tesla doesn’t have.
Tesla: Slow SP, Fastest DP, no special OpenGL
Quadro: Slowest SP, slowest DP, special OpenGL
GeForce: Fastest SP, slow DP, no special OpenGL
Exactly a card that’s averagely good in all disciplines for endusers that can’t shell out money in the 5 digits.