Xeon Phis are getting cheaper.. when can we use them with cycle :D

I would drop money on one right now if i knew it worked in cycles… the 300 dollar one that is, you think im made out of money or something?

News for the future:

Buy one now:
http://www.amazon.com/Intel-Xeon-5120D-Hexaconta-core-Coprocessor/dp/B00FF5AOYC/ref=pd_sim_sbs_147_4?ie=UTF8&dpID=4103nB7FmSL&dpSrc=sims&preST=AC_UL160_SR160%2C160&refRID=04P6A9XC0JQ6TGDRZM2G

There’s no chance of these being usable with Blender until the developers get them in their machines (you need to write code that works with them before they can be used).

Also, these are so bleeding edge in terms of computing that I’m not sure if they’re worth supporting right now (I would wait until they become more commonplace in the area of personal computing).

Thats what people said 3 years ago lol The news was they are coming to work stations now not just super computers.

Intel’s initial audience for these chips consists of researchers and scientists, not the ordinary CG artist.

I never said that they will not find their way into computers being used in a home or small studio environment, but it’s not going to happen overnight and will probably not be widespread until there is ample software support.

Couldn’t they be detected by OpenCL the same as their CPU and GPUs would, They probably use the came OpenCL compilers right?

(IDK what im talking about just taking shots in the dark lol)

I can’t see why not. It’s a card with many X86 CPUs.

I wonder if a virtual machine would work with that setup, I’ll pay the virtual machine tax for a hundred cores to push a blender render on

Well, if you use OpenCL anyway where’s the advantage of bringing in a big x86 card?

It appears that the Xeon Phi processors are quite weak compared to GPUs in various kinds of computations.
Of course the instructions are different, so there are some things which are fast on a x86 (the Phi) and require lots of cycles on a GPU.

But it looks like for most Monte-Carlo algorithms GPUs win.
Cycles is a monte-carlo algorythm, so it’s likely it would perform better on GPU than on the Phi.

The only advantage it surely has would be that, since the Phi is a x86 core, you can use any programming language you could use on a x86 cpu, so those developers who don’t like OpenCL can pick a language they are more comfortable with or which is more fitting for the job. If you just use OpenCL, you’re losing this advantage.

From what I’ve read, they still don’t use specific instruction sets that newer (modern) CPU’s use. They are on the order of x87. So at present they are limited, but it’s easy to see where it’s heading. Looks promising… can’t wait!

“Monte-Carlo algorithm” says nearly nothing about the code complexity. If it’s the good old “estimate the area of the circle” MC example, the code will be quite small and great for GPUs, yes.
But Cycles (still) uses the megakernel concept, which means that you have a ton of code inside one GPU kernel (I think that Cycles has one of the biggest single kernels you’ll find in any GPGPU software). This is actually the worst case for GPUs…
Another problem is code divergence - GPUs are so fast because their architecture uses larger cores that perform the same instruction on multiple datasets at the same time. That’s great for stuff like, say, image blurring, since you do exactly the same thing for every pixel. But in Cycles, different pixels will take different code paths (depending on shader etc.). Another implication is that as long as one thread is still running, all the others on that core have to idle. That’s why GPUs quickly lose their advantage in complex scenes.
Therefore these Xeon Phis might come in handy - afaics, their architecture doesn’t rely soo much on instruction- and memory-level parallelism, which would be rather good for Cycles.

Are you sure about that? From https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=KNC, it actually looks like they have some serious SIMD capabilities…

They are $3000 now. Seems to been a misprint.

http://www.amazon.com/Intel-Xeon-5110P-1-05-Coprocessor/dp/B00FF5AP1O/ref=pd_sim_sbs_147_1?ie=UTF8&dpID=4103nB7FmSL&dpSrc=sims&preST=AC_UL160_SR160%2C160&refRID=1D1B8PJ2KKJZJWAV5670 on special

The problem with these cheap surplus Xeon Phis is that they’re passively cooled, designed to fit into a server rack case that has active cooling built-in. In a normal PC case, they would just overheat. I have yet to find a Xeon Phi that is actively cooled and that doesn’t also cost thousands of dollars.

I have no experience with them, but at least in theory they should be recognized and “just work” as OpenCL devices (assuming that the program doesn’t filter out CL_DEVICE_TYPE_ACCELERATOR).

As for the X86 support: From what I understand, these are simple in-order x86 cores with very wide SIMD units, quite different from any modern x86 CPU. You couldn’t just take an ordinary program and expect it to run fast on a Xeon Phi, you’d still need write dedicated code to take advantage of it.

Comparing what i read from two posts from two posters

  • lukasstockner97 wrote thet GPUs are fast because they can elaborate the same instructions on multiple datasets at the same time, but they lose advantage in complex scenes because such scenes aren’t an ideal case for such operations
    -BeerBaron wrote that Phi cores use “very wide SIMD units”, which looks like it does the same as GPUs: same instructions on multiple datasets

Wouldn’t this mean that the Phi would suffer the same fate as GPUs regarding the non-optimal performance in complex scene situations?

I’m sorry if i misinterpreted or anything, i just want to understand.

I don’t know where this is coming from… GPUs can handle complex scenes fine in my opinion? we constantly deal with millions of unique polygons and in the 100’s of millions with instancing, and the gpus are always significantly faster than cpus.

I believe the more accurate term to use here in terms of GPU performance taking major hits would be scenes with highly complex shading situations (such as making use of advanced shading components or making use of very large node trees).

I don’t know if the Phi would be a better option then, because there’s not yet any real benchmarks regarding consumer applications.

The divergence tradeoff for SIMD exists on the Xeon Phi too, it’s just going to be less pronounced. For illustration, a Xeon Phi could have 61 cores with two 16-wide SIMD units each (Xeon Phi 7120A), whereas the GTX 980 comes with 22SMX that are 4x32-wide SIMD.

Yes. This seems apparent. For simple scenes CPU (24 core at 2.6ghz) is MUCH slower than the GTX 750s at school. However, with interior-lit scenes with large complex shaders that margin quickly diminishes. While I am sure that a more powerful card than the one I’m using could probably maintain that margin longer, it does appear that the more complex the lighting and shading scenario the less efficient the GPU renders relative to the CPU.

Furthermore, there seems to be a new breed of CPU renderers out there like Corona and Clarisse which are nearly as fast as Cycles GPU is currently. So I think there is room for optimization.

OTOH, with NVIDIA promising consumer GPUs with 32gb vram, there might not really be much motivation, and I’ve heard rumors that both Maxwell and PRMan are going to implement rendering on the GPU - and anything to speed up Maxwell is welcome news in my book. That engine is like torture because you KNOW what it’s capable of is amazing if it just didn’t take so long getting there.

I don’t mean “complex” like “many polygons” - GPUs don’t care about that, because in the end, all threads are running the same intersection code.
The issus starts to appear as soon as different pixels do different stuff - imagine a fur-covered model, for example. Some rays will hit the fur, some the object itself. Now, the GPU core has to run both fur and (for example) SSS shader code, but it can only do one of them at a time, so half the threads is idle while shading. That could be reduced with some redesign (first trace all paths, then sort by shader, execute the shaders, …), but code divergence will always be a problem with wide-SIMD architectures.

I think I will keep to used server hardware for heavy rendering projects.