GPU rendering for Durian. Renderfarm in a box?

Anyone remember this article:

I was thinking a lot about how Durian is going to be using it’s own render farm this time around and I was wondering what that render farm was going to look like. Is it going to be a collection of SUN machines sitting in a air conditioned room rendering out frames on massive clusters? Or is the team looking at a more economic solutions like linking 4 high end graphics cards together and rendering the movie on the GPU?

This is on my mind because if blander could render on the GPU, most likely only on Nvidia cards, it could offer a vary cheep solution’s for studios and hobbyist to build and make there own renderfarm. Essentially killing two birds with one stone.

It is theoretically possible seeing as Nvidia already offers CUDA, an environment to make programs that run on there CPUs. However it could also be vary risky, seeing as the team would need to make blender’s render engine render on the GPUs of graphics cards.

Where is everyone on this issue?

If they were doing something like that the best option would be a tesla unit…

It’s not even remotely feasible to port the render engine to the GPU for Durian and still achieve decent quality, we’d need an order of magnitude more time/developers.

Nvidia has CUDA, ATI has steam their version of CUDA but… Why program to something which is gonna be obselete soon…
OpenCL is a way better method of coding to GPU’s as it also includes CPU’s at the same time. I have read somewhere that OpenCL will be a part of blender but if it’s gonna be in the renderer we’ll have to see.
IMO Coding for CUDA atm. is not a smart option.

For applications to take advantage of GPUs they have to be structured (data and algorithms) to take advantage of the unique pros and cons. One of the biggest challenges for current GPU apps is getting data to and from the card. Getting data from the card is really slow because that isn’t a priority on a device meant to send all its data to the monitor.

If you could store your entire scene data (including textures and all temporary data structures and frame buffers) on the card’s memory so that you didn’t need to send or retrieve data from the card until the final image was complete, you could probably make a really fast renderer. But that would be a severe limitation on scene size.

If you did a direct port of the Blender internal renderer to CUDA or OpenCL it would probably perform worse than on a CPU. It would take a whole new approach to take advantage of GPUs.

It seems like there could be potential to speed up isolated parts of the render pipe-line (like if you need to cast a bunch of rays at a small collection of polygons), but you would want to wait for something like OpenCL to become very stable.

I don’t mean for this to sound negative… I think it would be very productive to rethink the data structures in Blender to take advantage of GPU rendering. Similar solutions are going to be necessary to take advantage of the many-core CPUs that will be coming soon. I just don’t think this will be an easy or quick thing to do.

For what i’ve reading in this forum, the devs are waiting ATI/NVIDIA to release drivers with final (read “not beta”) OpenCL drivers.

CUDA and STREAM are going to be obsolete soon by OpenCL so is safe to believe that the efforts are going to focus in openCL when the final specs and support are ready for that… probably this will be post durian.



renderfarm will probably be 10-20 quad core PC’s running some Linux distro - debian or ubuntu-server (64bit) I expect.
no screens, keyboards/mice or hard disks. all connected to 1 raid array.

Still need to figure out what software to run, last I tried DrQueue the management app kept crashing for stable and dev builds. So I might try a general solution or just run the jobs over ssh and manage with a py script.

Well Open CL or CUDA it’s still a cool idea to have a fast render farm in a full tower case.

I don’t think porting would be needed, you’d just accelerate very small parts like parts of shader calculations although some parts would make calls to much larger internal functions. Even just the most intensive parts of the code like what would be done with assembler or SSE optimization.

One thing that would be quite nice is to see some background calculation. Artists work on scenes for hours on animation etc and the CPU usage is very low. Surely some of that CPU could be used to precompute some lighting and then just cache it so that test renders are much faster and also remember data after a render like how Modo does its interactive GI.

It just seems like such a waste to use maybe 10-20% CPU when working on a scene for 10 hours and then blast the CPU at 100% for 30 minutes. Assuming no changes, the data could have been precomputed over 10 hours at 5-10% CPU usage in the background. Obviously the scene does change during that time but there could be per-object AO lighting caches updated as you work and change objects.

On the subject of CUDA vs OpenCL, although CUDA is supposed to be not too difficult to port to OpenCL, it is probably best to leave it until a cross-platform standard is in place. Few people will have compatible GPUs anyway so CPU and workflow optimization would be more useful.

i know you got burned with suns remote grid last time, but have you looked at rising sun’s mods to grid? another studio (houdini based) are also using grid for their farm, but haven’t released their mods publicly as far as I know. enyhoo: (down for me atm) (binaries of their build from a few years ago)

No, that does not work in practice at all, you’ll have to come up with a better argument than “only the compute intenstive parts need to be done on the GPU, and then you get a big speedup”. For render engines that part is nearly all of the code.

Further CUDA was released 2.5 years ago, and no commercial render engine has since started using it, even for small parts. If it is possible to significantly accelerate code just like that, why has no one done it yet? In Durian, we have maybe 3 months before a working render engine is needed.

4 high end graphic cards couldn’t compete with 10-20 quad core machines. The manufactures may throw numbers around like 215G/flops a second perfromance and twice the power of a quad core CPU, but in reality they are far from the power houses that they are marketed as, give a 1GB GPU a complex 4/5/6GB+ scene file and see how quickly it chokes on the data.

Not every part of the rendering processing is equally intensive though so what I mean is you could focus optimizations on a small subset of code vs doing a full port - not everything will be faster on the GPU anyway as it’s not task parallel (yet). The Open Shading Language link posted in another thread:

“an interpreter that executes OSL shaders on collections of points in a SIMD manner… the source code can be easily customized to allow for renderer-specific extensions or alterations, or custom back-ends to translate to GPUs”

That’s a GPU rendering engine and not a CUDA accelerated CPU engine - it uses ATI’s equivalent - but you can see the quality it’s capable of in real-time. Consider how many CPUs it would take to do that. Even the bundled CUDA demos like the volumetric smoke or the GLSL demos in Blender run much faster than on a CPU equivalent.

As for no commercial rendering engine using GPU accelerated code, Nvidia only bought Mental Ray up a year and a half ago. That’s not long to fully rework a commercial production-quality engine. As you say, 3 months is nowhere near enough time for Blender, which is why I don’t think GPU-acceleration is needed right now no matter the amount of code you can get away with optimizing, it’s still difficult code to develop. It’s good to keep it in mind though. With Intel going the x86 GPU route, it might not even be necessary.

Background render caching though. The 2.5 render engine can already do background rendering, doing parts of it while working would surely help the workflow. If the artists are using quad core machines, even 2 cores can probably run at maximum as they work away.

You mean 4 high end GPUs with 256 processors and 1GB VRam each? 80 CPU cores is fast but I think 1000 Stream processors would be faster when it comes to specific code. People often get into the mindset of CPU vs GPU when it’s really CPU vs CPU + GPU. The latter has to be faster when it’s done right.

It depends on the algorithms you use. As demonstrated by the Transformers trailers rendered on the GPU and LivePlace, photorealism is possible on a single GPU with the right algorithms. Major studios didn’t have renderfarms with 8GB Ram per node years ago but they still pushed out photoreal scenes because of how they built and optimized them.

Like I say though, it’s not CPU vs GPU so a GPU doesn’t have to process the whole scene. In Maya for example, some particle effects are rendered on the GPU and composited later.

EDIT: from the Mental Ray spec docs:

"Hardware Rendering
Now that modern GPUs can execute complex programs to compute the color of each shaded pixel, mental
ray can utilize this power for accelerating its high quality rendering. Many elements can be rendered
entirely in hardware, and are automatically combined with elements which still need to be rendered in
mental ray currently exploits the advanced Cg shading language by NVIDIA [Cg] and uses OpenGL to
exchange data with the graphics card. Cg versions of a large number of shaders including OEM shader
libraries are available. mental millTM [mental mill] supports the mapping of MetaSLTM shaders [MetaSL] to
both C++ and Cg, which removes the need to provide and maintain separate hardware versions of shaders.

Functional Overview, doc. 1.5 5
Functionality— Color and Shading
The unique implementation of hardware rendering in mental ray is the only one to provide full support
for all of the following state of the art features:
• PhenomenaTM
• high quality anti-aliasing
• order independent transparency
• motion blur
• soft shadows
• tiling for large image resolutions and high quality anti-aliasing"

That’s trivial compared to rewriting a renderer to use the gpu, I could do that with the flex/bison dynamic duo and I’m not really that good at coding.

They didn’t render out on the GPU though, they used the CPU machines for rendering and the GPU for updating the lighting in real-time, so they passed a small amount of specific code to the GPU to update. I can’t remember exactly what setup they had, but i know in the workflow they had static and dynamic parts, the dynamic parts were small adjustments made to the shader, which were run on the GPU.

I’m not saying GPU’s aren’t powerful, of course a GPU + a CPU will be faster than just a CPU, but it’s getting the code optomized to make use of that, and then the scene files, if for example you have a large scene, perhaps alot of data needs to be passed back and forth you’ll quickly swamp the PCIe bus going to the graphic card, as you’ll need to swap data between the memory of the GPU and the main RAM, in order not to crash out the software by running out of GPU RAM.

I’ll be honest i’m no expert on this, but i think for final, full qauity renders (at the moment) you are better off with a program that is optimized for a CPU. In the pipeline though a mixture of CPU/ GPU calculations could work wonders though, take for example the lighting aspects or the physical calculations (if not too memory intensive) for baking would be a nice usage of both. But i don’t think that will happen in blender, until they either get OpenCL, or they decide that they want to fork off in two branches, one for ATI and one for Nvidia (and then you have the whole driver compaitability and OS diffrences).

I hav been imagining the day when we can use GPU render 3d art works all along.

such a shame that much of the awsome computing power could only be used in games n PS(in my case). i know one day, this will eventually come to reality, but i just hope that day can come a little eralier

some points to consider…

“Optimized the bottlenecks” - brecht and I both profiled rendering speed for BigBuckBunny and there was no clear bottlenecks or obvious areas that speeding up would greately improve overall rendertime.
The time was fairly evenly distributed over many areas of the code.

  • Obviously this depends on the test scenes
  • Yes, rewriting parts of the render code can improve performance and take better advantage of modern HW but this is also a big task.

Out of interest I looked at mental-ray renderer to see if they had GPU accel, They do but if you read the details it seems only a few areas can be GPU accelerated (I guess they picked the ones that most made sense). From VRay’s feature page they have no GPU acceleration.

My point is even if you get a commercial render that has some GPU acceleration, its not like you instantly get full advantage of the GPU’s in a box if only a few operations run on the GPU.

“But wouldn’t it be cool” err, sure, just like having 10 people work full time for a year to improve rendering would be cool. I just dont think its realistic atm.