Xeon Render farms? Why when GPU is faster?

I’ve tried researching this and i can’t seem to find a definitive answer.

I know CPU is faster at physics stuff than GPU but generally GPU is faster for most things.

Then why are a lot of render farms using large 32 thread Xeon based systems rather than large GPU based systems?
Is a 32 thread 16 core system faster than a top of the range GPU?

I have found you can buy a 2 xeon 2660 processors for roughly £300 which is about the cost of a mid range GPU. Im trying to work out which would be better.

Hi, main advantage of CPU systems is it use system RAM. Easy to get 16 or more GB of RAM.
The last Blender movie scenes use more than 16 GB to render, not possible to render on GPU.
GPU is slow on volumetrics materials and SSS.
Main disadvantage of XEON, expensive ECC RAM and expensive mainboards.
If you can get a suitable board for two 2660 I would go for it.

Cheers, mib

Render farms and personal render boxes are different animals. The considerations between them are different. For starters render farms are rarely built with desktop boxes, but instead are rack-mounted systems.

It’s cheaper at the low-end for CPU-based rendering; upgrading from a quad-core to a hex- or oct-core is pretty cheap compared to purchasing a multi-thousand dollar GPU. Used to be you would get a multi-socket desktop motherboard with only a single CPU and defer the cost of a second/third/fourth until later. These days its just as cost effective to start that way. After a couple of physical CPUs, or around 48 cores or so, it starts to become more cost effective to get a higher end graphics card and switch to GPU rendering. As noted above supplying that many cores with sufficient ECC RAM for each thread can get expensive really fast.

It’s also important to consider online render farms. You can build a cheaper system that is optimized for creating and limited test rendering, then pass off the heavy lifting to someone else. Its not only about initial cost but system management, updates, and eventual replacement - all of that gets offloaded to the render farm operator.

I know CPU is faster at physics stuff than GPU but generally GPU is faster for most things.

GPUs are faster at data-parallel problems, which is not most things.

Then why are a lot of render farms using large 32 thread Xeon based systems rather than large GPU based systems?

There are many reasons:

Software:
Almost all software is written to run on CPUs, only a few specialized programs are written for GPUs. GPU programmers are harder to find, all the tools are worse and therefore those already expensive programmers will be less productive. Writing software for GPUs is a pain in the ass. Writing complex software for GPUs is a nightmare. Case in point: It’s been almost ten years since GPUs became general-purpose accelerators, but not a single existing rendering engine has been ported over to the GPU with all features intact.

Reliability:
GPUs usually run on proprietary drivers, which are black boxes. Especially the drivers by AMD have a poor reputation. If the drivers fail to run your code, you’re out of a luck as a programmer. The tools to inspect errors are worse, so you’ll run into situations where you don’t know if the driver fucked up or if the code is wrong. GPU code is much easier to get wrong, too. You have to test your code on multiple GPUs by multiple vendors, if you want to be cross-platform (in practice, often only NVIDIA GPUs are supported).

Scalability:
Dealing with memory on the GPU is much more constrained, many programs (like Cycles) will just fail outright if physical GPU memory is exceeded. Those GPUs with the most RAM (24GB as of today) are prohibitively expensive, yet fall short of what you can have for CPUs.

Is a 32 thread 16 core system faster than a top of the range GPU?

It depends. There are scenes that render much worse on GPUs than on CPUs. See the BI benchmarks for a broader comparison. Generally, scenes that are less complex (especially in terms of shaders) can have a significant speedup on the GPU. The best use case is probably tweaking materials in isolation from a bigger scene.

If you want to do serious work, go with the CPUs. They’re much less likely to fail you when it matters (assuming they’re intact). Besides budget, there’s no reason to not have a good GPU (or two) in your workstation, though.

DDR3 ECC RAM is actually cheaper than non-ECC RAM right now, but even DDR4 ECC is not that much more expensive.

Why would anybody assign memory per thread? It may make sense to assign memory per CPU due to NUMA, but I’m not aware of any renderer doing that, either. Cycles certainly doesn’t. CPUs are always more cost-effective when it comes to RAM (amount), because DDR ram is cheaper than GDDR ram.

1 Like

Thanks for the great responses.

As you all mentioned the Ram a lot, I know it’s easy to get a cheap workstation with 32gb ram. I’m assuming blender will be able to make full use of as much ram as I throw at it?

My budget isn’t so much and looking at it, if I’m correct, I could make a dual CPU workstation with lots of ram which would be able to keep up with a relatively good and same priced GPU?
While at the same time having a motherboard which is able to hold multiple GPUs if I was to ever expand.

I feel like a CPU system with such CPUs with 64gb of ram would be an amazing system and allow for a very high level of expandability for the future… is that correct?

64 GB of RAM will allow you to render some truly massive and complex scenes (which will likely not be possible for the GPU until those technologies that allow them to draw from system memory matures).

The DDR4 systems at least can even allow you to stuff 128GB of memory into one desktop (depending on the motherboard that comes with your machine).

  1. Lets say we have two processors with 6 cores at 2 GHz versus one processor with 12 cores at 2Ghz. The scene is rendered at the same speed so it takes both the same amount of time, yes or no ?

  2. Lets say we have two processors with 6 cores at 2Ghz vesus one processor with 6 cores at 4Ghz (notice the speed of this one is higher now, but has less cores). The scene is rendered at different speeds but in one case we have more cores then the other.
    Which one is faster for rendering ?

  3. Are 1. and 2. actualy the same render time ?

  4. If you have 2000 - 2500 € to spend for your computer which path,A or B, is better ?
    A. 2 low speed six core CPUs and nvidia 1070 GPU using cycles render with CPU
    versus
    B. one high speed six core cpu and nvidia 1080 GPU? Using cycles render with GPU

As you read my post you probably noticed that I am confussed about this topic, and I am in process of buying a new computer so I need to know this first before I build components for my future PC. Ofcourse I am waiting for Ryzen and Vega, but not sure if Vega will be a good option for Blender.

When rendering truly massive scenes there is going to be a load time as all of the objects are loaded into memory, and that time may be faster on the higher clock rate system, but after that you will have one tile per core being rendered so in instance number one the 12 core system should be finished in something like half the time of the other system (some fudge time for tile size and load time to be considered)

In the second example the faster example should finish before the slower machine but the thing to keep in mind is that rendering is a full system stress test. I have rendered test scenes on the same processor that was in different motherboards with different ram and I’ve seen some not inconsiderable differences in render time, so the more you test your machine to the limit the more likely you are to notice those differences.

Personally I’ve gone to using used server hardware (rack mount) systems for my rendering needs, and those tend to be fairly affordable if somewhat tedious to find good deals on.

That is my thoughts too. Plus when you have more cores attacking the render and then the Ram, which i’m under the impression can be much more than 64gb on DDR3 on a server style system, then you’ll render pretty fast.

The rack mounted system is the style which i have been looking at too.

In general, there’s always parts of a process that are not parallelized, the ideal speedup is limited by this factor (Amdahl’s law). In particular, if you run Blender with a GUI, image preview updates take up quite a bit of time. Multi-Processer systems also have NUMA, which may slow things down.

  1. If you have 2000 - 2500 € to spend for your computer which path,A or B, is better ?
    A. 2 low speed six core CPUs and nvidia 1070 GPU using cycles render with CPU
    versus
    B. one high speed six core cpu and nvidia 1080 GPU? Using cycles render with GPU

On a workstation, I would go for higher clocked CPUs, as it is an upper limit for everything. For the GPUs, that’s up to your preference. Two 1070s are faster than one 1080 and have better price/performance, but also take up more space and energy.

Blender/Cycles will use as much RAM as it needs. I’m not aware of it using any spare RAM. However, your OS will still use unused RAM for disk caching.

My budget isn’t so much and looking at it, if I’m correct, I could make a dual CPU workstation with lots of ram which would be able to keep up with a relatively good and same priced GPU?

As I said, it depends. Look at the benchmark scenes I posted. If your scenes are simple, the GPU may well be faster. If they’re complex/large, the GPU may simply fail.

If this is not just a hobby for you, you should value lower risk of failure highly.

Would you be able to explain this?

I’m doing more and more with Blender and hope to do it professionally in 6 months with potential work coming in once i learn it, so a more reliable CPU based system may be of more benifit. I could always get a 1070 for those certain applications, but once i have the base of a dual CPU system then i can start to think about GPU, rather than the other way around.

Explain what exactly? Amdahl’s law? Look at Wikipedia for examples.

The image update example is just one that comes to mind, because when you’re transferring data out of the GPU, you can’t simultaneously modify that data (and maintain a consistent result). The graphics driver handling all this is also a very serial program, benefiting from higher frequency but not more cores.

Another thing to keep in mind is that even when in theory you can parallelize tasks, doing so often make things much more complex. It’s less likely for a (part of a) program to be written to take advantage of multiple threads/cores. This is true for Blender, as well - many parts are single-threaded.

I’m doing more and more with Blender and hope to do it professionally in 6 months with potential work coming in once i learn it, so a more reliable CPU based system may be of more benifit. I could always get a 1070 for those certain applications, but once i have the base of a dual CPU system then i can start to think about GPU, rather than the other way around.

The new AMD Zen processors are looking to shake things up quite a bit, providing great multi-core performance at a much lower price than similar offerings by Intel. For a workstation, I believe that will be a better a choice than getting a multi-CPU system with lower clocks.

As for RAM, you can just add RAM as needed. If it turns out you never need more than 16 or 32GB, that should be sufficient.

If I have a motherboard for 4 CPUs, can I put in different Xeon CPUs
or do they have to be the same clock speed and version etc… ?

Not physically assigning, for estimating general system specs. If there are 24 cores, or 48 with hyperthreading, if there is only 8 GB RAM available on the system it will be quickly starved; free cores but no free memory does no good, as does no free cores but lots of free memory. 1 - 2 GB per thread is a good rule of thumb to follow.

Why would that be a good rule of thumb? What programs use such a large amount of RAM per thread? I can’t think of any, off the top of my head. The memory required for Cycles per thread is most likely less than a megabyte, depending mainly on tile size (on the CPU smaller is better, up to a point).