GPGPU/CPU add-on card war

Some good and bad news in the upcoming GPGPU cards.

The Tesla K20 is available for sale at the moment, at around 3000 USD, but the bad news is that it is not as good as it was expected, read this to find more.

The AMD FirePro 9000 is also available, at around 2000 USD, and it comes close to the K20 as far as specs goes.

http://www.amd.com/us/products/workstation/graphics/firepro-remote-graphics/S9000/Pages/S9000.aspx

The Xeon Phi, although not a GPGPU card, serves the same purpose and it looks like it should not be too hard to compile Cycles for it, even Blender could be compiled for it as far as I understand it, but no price as of yet.

Looks like there is a war coming for the CPU/GPU add-on cards, might be good for those of us who would like to build a powerful but not excessively pricey render-farms!

Gilles

Anything that can access system memory is going to win in the long run. Until GPUs can either have expandable memory or ship with enough memory that a large, complicated scene can fit without crashing, no studios will take GPU rendering seriously as a replacement for traditional renderfarms.

The Xeon Phi looks to have the advantage of not being near as limited in terms of the instruction sets that can be run on it. It will probably be rather expensive as well, but if its priced competitively and sees a large number of sales due to code flexibility, it could perhaps force both Nvidia and AMD to rework their architectures to sharply reduce the work needed for developers to get their programs working on them.

people here like OpenSource

here you go.

90 GFLOPS say what?

I don’t think the problem lies in the amount of ram the GPU cards can hold but in the software renderer, like AMD Optix, a GPU raytracer that can do memory paging.

Gilles

Paging from hard drive, to system memory, to GPU memory without specialized hardware ends up being slower than just rendering on the CPU and figuring out how to sync all information across all of the necessary hardware to minimize misses and incorrect calculations is one of the hardest programming problems to solve. The benefits of GPU rendering come from the fact that there is so little latency within the GPU architecture itself; adding other components without special dedicated buses impacts that advantage in a very negative way.

Unified memory is one of the HSA targets for 2013. But with current DDR3 Ram system memory acces would slow down rendering I guess. DDR4 in 2014 could help with this problem.
HSA could really bring some major changes in computing. Let’s just hope all those cool presentations will lead to a result

I think games do memory paging if the VRAM is full. The result is heavy frame rate drops. There is a reason why Vram is high clocked memory with a wide bus

True, DDR3 would slow things allot compared to GDDR5, but it was merely to point out that there are ways to go around the GPU ram limitation, another one would be map tilling, as done with 3DElight, which comes with a nifty tool that tiles image maps to lower memory usage, so only the parts that are rendered are loaded into memory, and also better optimization of tile rendering, so only the parts in the tile get into memory instead of the whole scene, as it is with some renderers.

As for GPU limitations, they are pushed further and further as new GPUs are coming out, and I would say that for character animation projects, languages like Cuda 4.2 are doing the job if You can work around some limitations.

As mentioned, the Xeon Phi on the other hand, would probably remove these limitations!

All I am saying here, is that there is a war coming in the processor board arena, and usually this means good news for the consumers!

Gilles

Xeon Phi has still memory limitation. 8Gb are fine but not enough for everything.
Problem with tiling is you still need the textures in memory even they are not on the tile. With tiles you can just save the memory for the rendered image (as far as I know)

Quad-Channel DDR3 Ram: up to 50Gb/s (Quad Channel, not available in consumer boards)
AMD HD7970 Ghz Edt.: 288Gb/s
Geforce 580: 193Gb/s

Dual-Channel support for consumer boards will stay with DDR4 so our normal home render machines won’t reach those 50Gb/s.
I think memory is a huge bottleneck here. Even if software would perfectly support memory paging the hardware is just to slow.

Yeah, tile rendering in Blender so far is good because it only loads part of the rendered image in the GPU memory, but in most tiled renderer, it is used to load only part of the scene in ram, thus saving more ram, but it also adds an overhead to the whole rendering process, and as for using tiled image maps, V-Ray uses them as well, as explained in this paper…

http://www.spot3d.com/vray/help/maya/150R1/tiled_exr.htm

Gilles

You can’t just load parts of textures or geometry in a path tracer. There would be no way to accurately calculate secondary bounces if you did.

Makes sense!

[EDIT] Brecht answered that one in another thread!

Gilles

Anonyone know when the Xeon Phi is due to be released?

Supposed to go in full production by the end of the year according to this article!

http://www.anandtech.com/show/6017/intel-announces-xeon-phi-family-of-coprocessors-mic-goes-retail

Gilles

Sadly 90GFLOPS is not that much, but it is meant to be very power efficient.

I’ve read up on it a little, and it appears each core has 512kb local ram, with the 1GB ram being seperate across all cores

I am a designer an take GPU rendering very very seriously and I am close to say that there are more like me than people in big anti GPU studios.

Ok, all three of the above products are VERY interesting to me and I may actually entertain ideas of buying one for use, but a few issues:

  1. AMD can’t produce a compiler that can handle cycles’ code.
  2. Intel’s option looks interesting, but would require some sort of compiling/ refactoring of the code that may never happen.
  3. NVIDIA seems like a nice bet since it is the only hardware that works with cycles besides a CPU, but the cards they gave tears of steel were slower than the GTX 5xx cards and to top it all off GTX 6xx boards are becoming less and less capable for cycles purposes. The capability may become less and less with future releases as cycles is not NVIDIAs target market long term.

It seems like for the near future until the war is over and the dust settles money is better spent in the CPU realm where we know the future pretty well. For 3000 or more I can buy an Xeon e5 with lots of threads.

You forgot free software Clover, by Tom Stellard and others. It already can execute small kernels on r600 hardware. I see only one drawback, it heavily based in LLVM, that have very bad relations with Apple, that can moral and other rights to close anything related to LLVM, if something bad happened. I know, it will not happened in near future, it is looks like paranoic, but remember all that round corner and screen aspect madness by Apple. GCC is more mature, supported, free, proven, and robust. If i had skills in compiler area, i would write GCC back-end based on published AMD intermediate language. I hope that loud HSA fuss will end something similar, working GCC backend that can produce code to any HSA hardware.

Agree with NRK - and it’s what I’m currently considering; but, since adding 12gig of extra ram, I’m also considering waiting until the next intel processor release.

I don’t have much faith in NVidia, less in AMD, so I’m waiting to see what happens with the phi.

I am also very interested in the Phi, especially since several developers announced compiler support for it already!

Gilles