One of the huge advantages is, that the Phi is more less a co-processor, like I explained in another thread already.
Many, to most might not remember, or don’t even know, that the early Intel had no floating point units.
First FPUs were available as co-processors, so you had to get an Intel 8080+8087, 80286+80287, 80386+80287, 80386+80387 if you wanted hardware floating point support…
With the 80486 we finally got an integrated FPU (not the SX model though)
The Xeon Phi, is like a coprocessor to the Xeon processor family, and can not be seen as a GPGPU replacement.
Oversimplified, you do not need to write special code for it.
You put the Phi in your system, re-compile your sourcecode with the supplied compiler, and it will make usage of the Phi.
I am sure there are optimizations you can make, but it’s not mandatory.
Back in the days the coprocessors were a huuuuge performance boost, they have been really expensive and it took some time till they got integrated into the processor.
I guess this is development repeating itself to some extend and before long we’ll have processors with a “Phi-unit” integrated.
Some might argue that the Phi is highly expensive, but as usual that’s firstly an opinion and secondly egocentric.
I am sure there are plenty of scientific and industrial applications where the Phi is “just the thing”, and even if not, it’s an interesting step in processor development…
Now some demystifying:
First some generic benchmarking, with no Phi but the professional cards:
LuxMark 2.0, OpenCL, Scene: Room, 2.016 mio tris, OpenCL, [samples/s]
- Tesla K20: …140
- Tesla K20 ECC: …139
- Quadro 4000: …143
- Quadro K5000: …192
- Quadro 5000: …204
- Quadro K5000 + Tesla K20: …331
- Quadro K5000 + Tesla K20 ECC: …325
- Fire Pro W8000: …860
- Fire Pro W9000: …1073
Asking Nvidia about that abyssal performance, they said that their OpenCL driver is not that optimized, and pointed towards CUDA raytracers… where comparison with AMD is impossible lol… well.
That much for OpenCL, I guess Phi wasn’t available at that point for the benchmark but you get the picture.
The interesting part are SGEMM/DGEMM and FFT operations. CUDA is a highly optimized for it, MKL (the xeon phi libs) are there as well, and AMD is rather… humble.
SGEMM/DGEMM are matrix operations, FFT is fast fourier transformation.
GEMM = general matrix multiplication, S/D respectively single and double precision.
Looking at the performance for those operations it looks like this:
- Dual Xeon E5-2690 … ~600
- FirePro W9000 … ~1700
- Xeon Phi (60Core 1.1GHz model) … ~1700
- Radeon HD7970 … ~2300
- Tesla K20 ECC … ~2400
- Dual Xeon E5-2690 … ~300
- FirePro W9000 … ~500
- Radeon HD7970 … ~700
- Xeon Phi (60Core 1.1GHz model) … ~850
- Tesla K20 ECC … ~1050
For those alien to numerical math and linear algebra, matrices are not only used for 3D stuff, they are also used to solve complicated systems of equations for instance.
However, it’s clear that the Xeon Phi targets:
- Double Precision
- Scientific/Industrial application
- Bleeding edge systems where speed is absolutely essential and cost irrelevant.
The conclusion, Xeon Phi shouldn’t concern the average Blender user now, but if one of “Intels Knights” is commanded to defend the peasants making a reasonable priced co-processor with good single precision performance…
I’d get one, just like I got a 80386 with a coprocessor back in the days. Given that there’s software around utilizing it
My usual techbrabbel, but that should clear up some things.