Quad core renders as Dual Core. Why ?

Hi all,

Recently I upgraded to 4 Cores CPU and found out render to become much faster

But today when I played with ray tracing etc I found that blender does use only 2 cores
when rendering some parts of image. I have 4 threads settings in Render menu, no problems here

Looks like blender render can’t use more than 2 Core for some algorithms.

Attaching my blend file, render and render settings for the material that slows down the blender.

How about Yafray ?

Thanks,
ygs

Attachments



4cores_light.blend (540 KB)

Hmm, I don’t think there’s much to do… But try clicking on the the little car so it will configure automatically. And have you tried updating to a newer version of Blender?

small car didn’t help.
The problem is the 4 cores ARE USED when rendering materials with
Ray Transp button DISABLED. When when render starts to calculate material
with Ray Transp ENABLED, it stops using all 4 cores during rendering. Only 2 !

I use linux and blender 2.47 official release, Quad Core processor and 4Gb RAM

Attaching rendered picture. When blender renders lantern’s material it switches to 2 cores only :frowning:

Any ideas ?

Is this bug and I need to send it to developers ?

Attach your example blend, I have a quad core under windows, I am curious as to this limitation.

Atom,

blend file attached. Please take a look at the bottom of first post

How about adding more X-Parts and Y-Parts?
Make it 8/8 or 16/16 i am sure all 4 cores will work 100% then until the pic is finished. :slight_smile:
At least it is on my machine.

arexma,

adding more X-Parts and Y-Parts ( I set 32/32) helped.
Now 4 cores work together all the time and blender renders much faster.

For above scene (640px-480px resolution):-
2mins 49sec - 4 Xparts, 4 Yparts
1min 42sec - 32 Xparts, 32 Yparts
(1.7 times faster)

For above scene (1024px-768px resolution):-
13mins 14sec - 4 Xparts, 4 Yparts
4min 06sec - 32 Xparts, 32 Yparts
(3.2 times faster)

For above scene (1600px-1200px resolution):-
15mins 20sec - 4 Xparts, 4 Yparts
9min 55sec - 32 Xparts, 32 Yparts
(1.5 times faster) - WHY ?

It’s interesting, when rendering 1024x768,
I’m getting better overal performance with 32 Xparts, 32 Yparts settings

To developers:-
This should be a default setting for the next blender releases

Thanks a lot,
ygs

Those are considered “Settings based on the technical expertise and personal needs of the User”
IMO per default they should be set to 1/1 :smiley:
Also setting more X/Y Parts =! always faster. Depends on the scene.

Actually there was never a switch to 2 Cores. 2 of the 4 Cores simply where finished with there work ^^ Explaining all that would go into Multithreading and Taskdistribution in an algorithmic state machine ^^ There is no real need to understand it :wink:

you’re right,
and looks like the overal performance depends on frame resolution too.
Although I don’t understand how it’s possible :slight_smile:

See updated data above.1024px-768px is rendered faster than 1600x1200
when Xpart/Yparts are 32/32

I have a single core machine that renders on 2 (Automatically set).
Not sure how it happened, but not complaining.

This actually slows your system down.
There are two major kinds of multithreading:
Software Multithreading and Hardware Multithreading.

SW MT:
If you set your Renderthreads to 2, blender will start 2 renderthreads. Due to the fact your CPU does not has multithread support, it switches between the two threads causing a thread change overhead, slowing down the system. bottomline changing to 1 thread will render faster for you.
Basically you got the process Blender which starts 2 renderthreads. Now the CPU gets I/O requests from both threads and has to switch between them causing this overhead. If you run just one thread a singlecore is faster.

HW MT:
Dual, quad or manycorecpus can handle multithreading hardwareside and create no overhead on threadchange, as a matter of fact, threadchanges makes the whole system faster…

Overall the simultaniousity or concurrency of tasks within a operating system are an illusion. Well more less. complicated… and threads, tasks and processes are all different things…
this is getting ways to complicated… this topic fills courses and studies at universities… so, either ask jesterking, i am sure he can give some more insight though i am not familiar with the blender source at all or google/wiki on topics like:
Multithreading, Pipeline Hazard, translation lookaside buffer and follow those

Sorry, I missed the attachment initially.

It looks like the bottle neck is in the Ray-Transp Glossiness setting. If you set glossiness to 1.0, all 4 cores are used, but drop it below 1.0 and you only get 2 cores for Ray Transp processing.

This does look like a bug in the code. Nice find!

Atom,

this might be a bug, but the fix is simple: - increasing Xparts/Yparts

Do you think I need to pass this to blender’s bug tracker ?

neonstarlight,

the only solution is hardware multithreading. And it works great.
Quads do great job in rendering. Now I want my CPU to have 16 cores :slight_smile:

This might not be a bug but a limitation of the algorithm. Not everything can be multithreaded.
YGS higher resolution take longer because the renderer has to make more calculations. Read up on how rendering works. Each pixel for the final resolution has to be calculated so higher resolutins more calculations more render time.
Thex and yparts basically split up your screen into subparts. Each subpart is calculated by itself and by one of your cores so the more parts you have the more the four cores can do in parallel. This has basic limitation as the scene needs to be prepared and the more subparts there are the longer this preparation takes so there is a break even point where more parts do not speed up the render.

@arexma: Although there is a differnece between Sw and HW Threads keep in mind that there are technologies like HT so 2 Threads on one core could be faster then one thread. Probably depends on the thread scheduler. Haven’t tested it thouroughly in Blender but I know that in Luxrender I get better results when I turn the threads up to 3 although I only have a 2 core system. 3 threads use the 2 cores better then 2 threads. So neonstarlight do a test and see what setting works better for you.
Oh and keep in mind that some operations do not seem to be multithreaded yet. AAO in the official 2.47 Windows release as an example.

musk,

higher resolution take longer because the renderer has to make more calculations. Read up on how rendering works. Each pixel for the final resolution has to be calculated so higher resolutins more calculations more render time.

No, no. I just found that when I render 1024x768 frame,
the configuration with Xparts/Yparts=32 works 3.2 times faster than
Xparts/Yparts=4. BUT when I render 1600x1200 I have only
1.5
times gain

ygs

Yes because 1600x1200 = 1920000 and 1024x768=786432
786432/1920000=0,4096 so 1600x1200 is .4 times slower then 1024x768 taking your 3.2 and multiplying them with 0,4096 gives me 1,31072.
So your 1.5 times are actually better then the factor I calculated.
TO clear things up in the 1600x1200 resolution the computer has to calculate 1133568 pixels more then in the 1024x768 resolution. Makes perfect sense.

Blender renders an image in little squares. Each square is assigned to a single core/cpu. If you wanna utilise multiple cores better, make sure the image is subdivided into more xparts/yparts.

For example: I think blender subdivides the image into 4 squares on the default setting. If each of these 4 squares gets assigned to each of the 4 cores, then if some squares are already done, the finished cores will just sit around idly until the last square is finished. This really sucks if for example the last square contains parts of the image which are complex to render.

I think there is a slight overhead when you subdivide the image into too many x and y parts. Not sure though. Either way, it helps to fiddle around a bit with a scene to see what the best settings are.

Hi,
Some days ago, I have submitted a problem that is close to this topic, but it has had no response.
I have encountered a problem when rendering by parts.
1-Some image blocks are not rendered correctly with raytracing enabled and many particles with alpha.
2-The number of parts displayed in the render window is wrong. For example, for 12 x 12 parts, Blender displays rendering part N / 120 instead of 144, and for higher values, the max seems to be 130 !
Full explanation on this topic :
http://blenderartists.org/forum/showthread.php?t=135676
Thank you for reading it.
Philippe.

There isn’t a particular maximum number of parts, but there is however a minimum size for a base part (the remainder part doesn’t count). If the height of a base part will be less than 64, the number of parts is automatically recalculated as
yparts = floor(1 + yres/64)
and if the width is less than 64,
xparts = floor(1 + xres/64)

E.g., if you try to render at 800x600 with 12x12 parts,
800/12 = 66.7 which is greater than 64, so it’s left alone, however
600/12 = 50 which is less than 64, so yparts = floor(1 + 600/64) = 10
12x10 = 120 parts in total

E.g. 2, if you increace the number of parts for 800x600,
800/xparts where xparts > 13 is less than 64, so xparts is recalculated as floor(1 + 800/64) = 13
13x10 = 130 parts in total