Blender Internal Render : Slow render speed with many threads

I recently purchased a new PC with AMD Threadripper 1950X, which has 16 cores 32 threads.
I used to use a 8 core machine, so I expected the new PC to render much faster.
But it seems that Blender Internal Render cannot use many threads efficiently.
Rendering with 32 threads is so slow that it’s almost unuseble.
So I tested with various thread counts and I found out that Blender Internal start to slow down with more than
8 - 10 threads (depends on the scene).

Is this a bug? Or is it just how Blender Internal works.
Cycles works fine. More threads, faster the render speed.

The only way to take advantage of 32 threads is by limiting the thread count to 8 threads on a project file
and run 4 instances of blender then hit render on all of them.

I know I should move to Cycles, but I mainly make non photo realistic renders and Blender Internal is much
easier in such cases.

If any one know how to fix the problem or another work arounds, please let me know.

Tnank you.

Interesting. Have you tried if there is a difference between:

  • limiting it to 8 cores in blender
  • using game mode (might give lower memory latencies) (no core limit, but you’d only have 8…)
  • just disabling SMT (16 threads without limit).

On the cycles side of things I guess you have seen things like:

> tobbew

Thanks for the reply.
I’ve tested with game mode, which limits the number of cores as well as diasbling SMT.
The result is the same.
Cycles and Cinebench works fine, so I think it’s blender internal’s problem.

I think I should move to cycles.
Thanks for the links too.

Usually, you must experiment a little with the memory usage, unless you have a machine that’s capacious enough that it simply does not matter. You want to be sure that Blender is able to obtain all of the memory that it needs, without any “paging” or “swapping” being done by the operating system.

Like all software, Blender runs in a “virtual memory” environment, in which its actual memory requirements might be met by a combination of “actual RAM” and data that is being swapped (by the operating system) between “actual RAM” and the hard drive on a “least recently used (LRU)” basis. Well, due to its CPU-intensive nature, Blender cannot tolerate this treatment. You must, therefore, adjust the Blender parameters so that, on your machine, Blender does not ask for more than it can actually receive.

Also note that this is very much a project-specific, even scene-specific, consideration … and that it is by no means peculiar to BI versus Cycles. It is intrinsic to the fundamental nature of Blender as a so-called “CPU-bound” application that also sometimes demands great amounts of memory. You must ensure that it can actually have everything that it asks for, without delay, with this project (scene … particular render … what have you …), on this machine.

It just goes with the territory: “if you’re doing CPU-bound and memory-hungry work on a digital computer, you have to account for this.”

Most to all of the pre-built Threadripper systems I’ve seen come with 32 gigabytes of RAM in the default configuration (the chips can make use of up to a terabyte, but the initial crop of motherboards max out at 128 gigs). I think it’s more likely due to Blender Internal not being near as efficient with multicore processing as Cycles (the code at the core of the engine dates back to when multi-core processors were uncommon).

Threadripper is also based on the Ryzen family of processors, which for the most part is a new platform (BI is also unlikely to have anything that takes advantage of those newer platforms as well as the newer instruction sets).

That said, I’m not sure if it’s worth investing a lot of time learning BI, as the engine will be history in 2.8 (replaced by Eevee).