Can Cycles FPGA?

Now that FPGA with OpenCL drivers are slowly making their way into the “with that money I could buy a car” category, I’m curious what the prognosis would be for content creation. I’ve long suspected that the GPU as we know it today isn’t really a sustainable solution; they’re extremely inefficient in terms of performance per watt. and as a result do not scale well. This is why many production renderers continue to focus instead on CPU rendering, as deploying thousands of GPUs is not very practical.

Clearly, with Intel and Microsoft investing in FPGA, and Google in whatever technology is driving their proprietary accelerators rather than traditional GPUs, I would guess that GPGPU is not going to be a long term many-core solution for professional use. I also just don’t see us using CUDA indefinitely, an open standard will prevail, if not OpenCL, then something else - and my prediction is non-GPU based accelerators will eventually drive this.

So, I guess this has turned into two topics: First, what is the future in high performance computing for content creation, and second, what is the prospect of using FPGA-based accelerators that are currently available? What is the likelihood here that they will perform as well, or better than, existing GPU options? And finally, should I buy a used Toyota, or an Intel Deep Learning Inference Accelerator?

It is not correct to say that GPUs are inefficient in terms of performance per watt. In the contrary they are very efficient in peak arithmetic power per watt if the problem is massive parallel and compute bound . This is why a lot Top500 systems included GPUs.

But not every problem has efficient algorithms that map nicely to GPU architectures. And the more complex you algorithm gets, the more likely it is that it can’t run efficiently in parallel on SIMD/SIMT architectures.
Code paths diverge, memory access patters get more scattered etc.

And about development: implementing a non-trivial renderer efficiently on a GPU is 5-10 times more complex than just write it for CPU.
Even if you restrict yourself to one particular GPU brand an technology (eg Nvidia/CUDA being the first an most mature GPGPU environment). When going to OpenCL the problems just multiply because then you have to fight not only with one compiler and architecture platform but several.

This is why a lot of GPU renderers first concentrated on CUDA and later struggeled hard to get good performance and feature set with OpenCL on AMD and Intel GPUs (and even CPUs for efficient usage of AVX etc.).

But now to the main topic: I don’t see FPGAs replace or enhance CPUs and GPUs for raytracing any time soon. They simply don’t have the user base that would justify development. I also can’t imagine that the OpenCL support is good enough to run non-trivial code on FPGAs.

And the compile times will surely even be worse than for GPUs. It takes even long to “compile” (mostly place and route) simple Verilog/VHDL stuff for FPGAs.
FPGAs are great to do a lot of bit twisting stuff in parallel. But for more complex operations it is usual to use “soft core CPUs” which are simple in-order RISC architectures defined in a HDL.

They are far behind in features set and performance behind “real” CPUs and mostly usable in embedded designs to drive a SoC (system on a chip) with different special function units to handle some specific problems efficiently.

Some people designed ray traversal/intersection accelerators of FPGA. But often they could not even use floating point math as the most FPGAs don’t provide special hardware for this. They might have 18bit integer multiplier blocks you can use to do fixed point DSP like stuff.

But as soon as you need even a simple soft-core FPU the ressource usage of your soft CPU gets much higher.
And we don’t talk about the number crunching FPU units in GPUs that have efficient special function units for a lot of higher level math functions that is needed in graphics shading and HPC. We just talk about basic operations like add/sub/mul/div.
Even a sqrt with high troughput is a lot of logic.

Surely the high-end FPGAs may include more floating point hardware over time. But still I doubt that this will be easier to program than GPUs. And as I said: as long as a high end FPGA is out of reach for a mass market the GPUs have a big advantage.

It’s the economies of scale: the high development costs of GPUs are paid by millions of gamers.
Even Intel gave up on Xeon Phi because only targeting the HPCs market is not enough to pay for the development cost of it.

As far as general-purpose computing goes, GPUs are very power efficient, even at raytracing.

This is why many production renderers continue to focus instead on CPU rendering, as deploying thousands of GPUs is not very practical.

This has everything to do with development cost and limitations and the fact that porting an existing production renderer to the GPU means a rewrite from scratch. A GPU-based render farm is just as feasible as a CPU-based render farm.

Clearly, with Intel and Microsoft investing in FPGA, and Google in whatever technology is driving their proprietary accelerators rather than traditional GPUs, I would guess that GPGPU is not going to be a long term many-core solution for professional use. I also just don’t see us using CUDA indefinitely, an open standard will prevail, if not OpenCL, then something else - and my prediction is non-GPU based accelerators will eventually drive this.

I doubt it, NVIDIA is having a huge success right now in machine learning with practically no competition on the horizon. All the major machine learning software runs on CUDA, but not necessarily OpenCL. CUDA also has support for (NVIDIA-exclusive) tensor cores. Google were doing their thing first, but I wouldn’t be surprised if they gave it up now - economics of scale are in NVIDIA’s favor.

So, I guess this has turned into two topics: First, what is the future in high performance computing for content creation, and second, what is the prospect of using FPGA-based accelerators that are currently available?

Both GPUs and high-end FPGAs are likely to be bottlenecked by memory bandwidth, so you probably won’t be getting the speedup you hope for. Plus, if you have a raytracer running on an FPGA you will probably want to turn it into an ASIC for even better results. Such a product already existed (and failed) in the market, even though it was much more power-efficient than GPUs.

What is the likelihood here that they will perform as well, or better than, existing GPU options? And finally, should I buy a used Toyota, or an Intel Deep Learning Inference Accelerator?

You should probably invest into a Dogecoin mining rig.

While FPGA’s are versatile in their applications
(from TOF camera logic, to controling100th’s of IO’s in a chemical plant or filtering radiowaves for radar alike aplications (anti stealth) )
For FPGA’s bottleneck might be how slow CPU’s / GPU’s are…

Most problematic has always been coding them, if i remind correctly MS has plans to make life easier to code them.
But that might take another 5 year or so