More than 64 threads

Hey devs.
Can you please increase the number of threads for Cycles?
64 is out of date.

Or can I easily change that number in the code and compile it?

Sergey has you covered (on allowing more than 64 threads)
https://developer.blender.org/D2049

Implementing it is not near as easy as changing a number.

Thanks for the info.
But Windows sucks, I use Linux.

Hi, I know the Blender Foundation already use a 64 core, 128 thread system on Linux and it work fine.
If you need more than that you may can tell us what kind of system you use.

Cheers, mib

I use an E5 V4 system, which can have up to 88 (official) or 96 (unofficial engineering samples) threads. But there are ppl with E7 v3.

E7 v3 big business: 8 cpus / max. 288 threads
E7 v3 small business: 4 cpus / max. 144 threads
E7 v4 big business: 8 cpus / max. 384 threads

Coming 2017:
E5 V5: 2 cpus / max. 104 threads
E7 V5 big business: 8 cpus / max. 448 threads

Hi, I saw your post in the benchmark thread.
The developers work on a patch to kick this limit at moment.
My first post was wrong, this was network render.

Cheers, mib

Hehe, I use cmake-gui to configure cmake, there is also ccmake for DOS gui fans.
Cmake-gui is self-explaining more or less, you have to select source and build directory to get the basic view.
In basic view you have only on/off switches, WITH_CYCLES=ON, for example.
If you hit Advanced button you will get all lib path and also compiler flags.
You can add flags or create new entries.
Should be easy for you when you already went through the compiling process.

Cheers, mib
EDIT: Does Cycles now really use your 72 threads?

I have read about the cmake-gui, but they say I have to create a build directory first.
I think it’s more important if those flags I mentioned are usable at all, Blender may crash by the aggressiveness of those flags.
Sorry. I cannot confirm about the 72 threads usage now, it’s really hot in my room and running a 72 threads machine is no big deal. Furthermore it can be difficult to find out from the system monitor, unless you know how to reveal process and thread dependencies.

EDIT: improved context

march=nativ is usable, I checked it and never got problems.
You can use the build directory you already have.

Cheers, mib

@cpurender
Have you set the cpu frequency governor to performance mode?

No. But I’ll try it at a later time, when my room is cooled down to an acceptable temperature.

@cpurender
Do you have ES cpus?

Yes. E5 ES V4 18 cores / 36 threads each.

@cpurender
Ok, i’m guessing your L3 bandwidth is getting exhausted. Your uncore runs at max 1600MHz (vs stock 2800MHz)?

I don’t know about Cycles but lets say Cinebench, it shares the threads evenly and uses them efficiently, as I tested by disabling some cores in the Bios.

@cpurender
I tried lowering the uncore frequency multiplier to 16 and the result was 3 seconds lost. Anyways, you said something about the cpus running hot? I have lowered the cpu VCCIN to 1,5V (disable svid) and core & ring bus voltage offset -10mV. It should reduce the power consumption a lot, I totally recommend you try it. Have you checked the core frequencies while rendering? Are the cpus throttling?

I think I cannot change CPU related stuff on my motherboard.

I have just tested again and can confirm that all 72 threads are used, but normally 1 is always for Ubuntu compiz, so lets say 71.

I also tried -Ofast combined with -march=native, it seems to be slightly faster but not worth to try out.

CPUs are cooled well enough and max temperature is below 65 ° C on full load.

Ok. Now I see different results by testing my own build with different thread numbers.
Maybe I made some mistakes earlier while comparing.
BMW27.blend
60 threads: 0 min 44.63 sec
63 threads: 0 min 43.82 sec
65 threads: 0 min 42.72 sec
71 threads: 0 min 39.20 sec

The build from blender.org uses 64 (?) threads but finished in 40.xx sec.

A critical thing to remember about “threads” is that they do not(!) multiply the CPU resource: they divide it.

It’s perfectly fine to run 128 threads … if(!) you have 64 cores! That’s only two threads per core.

Furthermore, with Cycles you have one “fixed, unchanging, limiting-resource:” the GPU chip(s).

Multithreading does not speed up a so-called “CPU-bound” operation such as graphics rendering, except to the extent that it allows multiple CPU cores to be effectively employed … and then, only to the extent that “those cores, on this motherboard,” can be effectively employed. Each thread will “consume a full time-slice,” almost every time. Like it or not, there are only 1,000 milliseconds of CPU-time available in each second.

A thread-setting much larger than, say, “2x the number of cores in this machine” is, IMHO, basically wasted. Counter-productive.