Work and Render: decrease your threads a bit

Hi, i am currently work on my thesis on 3D-Renderfarms and do lot of researches in any direction. So this came up during my researches of threads and their speedup imapct.

When it comes to threads, blender will verify Ahmdahls Law pretty accurate. The first 35 % of threads will have something like 80 - 85% boost impact. The last 30% of the threads will boost your render like 8 - 10 %.

So if you have a CPU which natively supports 32 Threads, the last 15 threads are adding not so much to your render speedup.
That means, for all who work and render on the same machine, a decrease of the threads by 1 - 2 will not hurt your render and still let you work fluently through your day (depending on your logical processors).

Interesting. Have you verified that it isn’t simply a matter of memory bandwidth?

Ah yes, i made a line of tests with different machines. A HP Compaq 8300 Elite Ultra-slim Desktop, a HP Workstation Z420 and a HP Z840 Workstation. Respectively an i5, i7 and a E5.
I let them crunch through a bunch of scenes and settings. The individual results where different but the graphs were the same.
The Z420 showed of course a lot better resoultion, because of his native 32 possible threads.

I’m fairly sure your observation has very little to do with Amdahl’s Law and almost everything with the fact that hardware threads don’t necessarily map to actual hardware resources 1:1.

Many Intel CPUs employ Hyperthreading to pull in maybe 10% in extra performance by having some parts of the CPU pipeline duplicated. AMD has its Bulldozer “modules” share one FPU per two “cores”. In either case, the actual floating-point throughput remains the same, whether you use the extra threads or not.

You will find that for something like path-tracing, the serial part of the program is vanishingly small and it scales extremely well to multiple “actual” CPU cores, with memory pressure being the next bottleneck in line.

More generally: Hyperthreading et al. adds little to your render speedup. For maximum performance, you should use at least as many threads as you have “physical” or “real” cores, which depends on the CPU.

Heh, yep, if you don’t have actual physical cores for all your threads, then of course they aren’t going to help much.

Duh.

For a moment there, I thought Intel started selling 32-core CPUs. I guess that would have been a little optimistic. So hard to keep up these days! :stuck_out_tongue:

4 of Intels Hyperthreads are enough to beat a quad core AMD CPU

There are limited single threaded operations… they mainly are image loading, file saving, bvh loading and compositing steps… usually only taking a percent or so of the actual render time.

as many others have pointed out, it is hyperthreading which does not help (much, last time i looked into it it gave a 5% speed boost) with raytracing. as raytracing in general, as a algorithm, is extremely good at multithreading.

if you want to test this… then if you limit blender to only use 16threads per instance… load two up and render an animation… record how long it takes (number of hours)… then set blender to use 32threads per instance and render… and record how long it takes… and now render just off one blender instance with 16 threads. they all should be approximately the same, maybe a difference of 5-10%, but nothign drastic.

If you are doing a thesis on this, you may want to do more research into hyperthreading and how it impacts different applications before making assumptions.

what about something like a Parallela ?

Those are all actual cores right?

you can’t use cuda cores to render blender internal right? (wouldn’t that be really really fast?)

They have a dual core ARM chip as well as a RISC multicore chip. the ARM chips are like what are in your phones / tablets… they are not x86, but still have decent performance & functionality… risc are used for very specialised calucations… not sure how muhc that will improve performance… or whether it woul dbe worth while porting cycles over to it

you can’t use cuda cores to render blender internal right? (wouldn’t that be really really fast?)

Nope… BI is not written in CUDA… It is not a matter of just running it through a interpreter… it would need a rewrite to make it possible… which then you would end up with people getting frustrated with the code spaghetti that BI is, and end up saying lets just start over with a clean architecture and a modern code base for expandability for the future, and then you end up with something like cycles from 2 years ago.

Reality check:

My current scene renders on cpu with hyperthreading in 6:51:09 and with hyperthreading disabled in the bios at 9:13:91. I calculate that as: no hyperthread is 35% slower than with hyperthread or if i reverse the calculation hyperthread is 26% faster than no hyperthread.

The memory bandwidth is only an issue when both of the paired hyperthreads request access to memory at the same time. Since each tile takes a different mount of time the collisions are reduced.

While 8 actual cores would definitely be faster than 4 hyperthreaded cores there is still a significant boost using hyperthreading.

Cheers

I stand corrected. I was overwhelmed by the familiar pattern and to hasty with my post.:o

I can back up tomtuko here,

with hyperthreading on a 6 core i7-5930, running 12 threads gets me a render time of 01:38.45
Running 6 threads to match native cores resulted in a renders time of 2:20.57

Hyperthreading increased performance 42%. Older versions of hyperthreading didn’t provide much of a boost, but Intel has heavily optimized them in the past couple rounds of architecture updates.

Here is a graph, rendering the same scene on my PC with 1 to 8 threads, at 100 and 500 samples. Each test ran 10 times and the results averaged. CPU is a i7 4790k.

The graph shows how much faster each set of threads is vs just 1 thread.


Funnily enough, the performance of 8 threads is 42.9% on 500 samples, and 40.9% on 100 samples.
This is quite close to SterlingRoth’s 42% as well.