using of multicores in blender rendering

Bernardo · July 30, 2010, 12:18pm

hello!

i hope this is the right section for this!

i have a question about the usage of cpu in blender rendering.
i have an i5 750 cpu, so 4 cores, 4 threads.

when i render an image i can see the 4 cores fully used, but at the end when it start to render the last part of the image it uses only one of them. i noticed the problem in one image that has a “really heavy to compute last part” and to render only this part takes 10 minutes, about the half of the total render time. and for all this 10 minutes blender uses only 1 of the 4 cores of the cpu.

i have this with both 2.49b and 2.5x
is it normal? i have the threads set to 4 (and is the same if i leave the option on auto).

i had a similar issue with physics simulations, in which the usage of the cpu is around 40% all time long.

does anyone know how to solve this? or it is like this and nothing can be done?

thank for any help!

Richard_Marklew · July 30, 2010, 12:26pm

The render is split into smaller tiles (you can set the X and Y numbers in the render settings). Each tile is only rendered by one core (the total render is multithreaded but the individual tiles are not). If you have one section that takes a long time to render you’ll be waiting for it to finish. Try increasing the number of total tiles so it evens out the render. Ideally you want all your cores to be working 100% of the time. You should hit an optimum number that will give you the best render time. This will vary for each image.

Parts Rendering wiki entry: http://wiki.blender.org/index.php/Doc:Manual/Render/Options#Parts_Rendering

mzungu · July 30, 2010, 12:33pm

AFAIK, only the actual render engine (BI) is multi-thread capable. What you are seeing with the drop to one core is when blender’s compositor (or whatever) is compiling the final image output (putting layers, or whatever, back together.) Likewise with physics calculations, not multi-threaded (yet?)

Bernardo · July 30, 2010, 12:38pm

thank you for the answer… i already tried this way and i passed from 4x4 to 8x8 with a couple of minutes less (out of 20) then i tried to increase further but looks like the situation didn’t get better.

so however every single part cannot be computed by more than one thread…mmmm that’s not good… it looks like this one single part is more hard than the rest of the scene! the problem is higher in this particular render of course because of the difference of complexity in parts of the scene, but still wouldn’t be good to be able to compute every pixel of the image with all the power avaiable?

however, just to know, is there a limit to image subdivision?

Bernardo · July 30, 2010, 12:50pm

sorry mzungu i saw your reply after posting.

the problem is that i see the drop to one core only usage when the image is still rendering (one part is black and slowly appearing). so i think in this case it has nothing to do with the compositor or layer or things like that…

namekuseijin · July 30, 2010, 1:10pm

You are really asking for fine-grained parallelism instead of the coarse-grained one in use at large. This situation is not likely to change anywhere soon in the industry.

though one could make a (relatively) “quick” hack of further dividing that last tile and distribute them to the idle cores…

endi · July 30, 2010, 1:11pm

I think all renderers uses only tile technique for multicores.

Bernardo · July 30, 2010, 2:42pm

namekuseijin i didn’t understand good what is the “quick” hack… you mean just increase the dividing rate of the image? for istance from 4x4 to 8x8 or 16x16? or something else?

however the maximum i could reach was 128x60 but then during the rendering was written that the parts where 130… so something was wrong. however the last part is still really long… but if that’s the way it works, what to do

endi · July 30, 2010, 2:48pm

a developing idea: divide the last part

Bernardo · July 30, 2010, 3:21pm

how to do that?..

namekuseijin · July 30, 2010, 4:14pm

a “quick” hack for a programmer, that is.

in the case of a user, more tiles still is the best option.

Bernardo · July 31, 2010, 1:47am

ok so then that’s what i did. in the 2.53 i could reach 512x512 (even if in the rendering window is writte 130 parts…how’s that?). however in the 2.53 the problem with this rendering is not so big since the image, that in 2.49b took 22 minutes, here takes about 3:30 therefore the time loss for the single core part image calculation is really small.

since im here i would like to ask another question, that is the reason why all this started. one friend on a forum said he rendered this image in 19 minutes and he has got a AMD PHENOM II X4 965 and 4gb ram, while me with i5 750 and 4gb i render in 22 minutes. is normal this difference? from the benchmark i found online looks like the cpu’s are more or less the same with some slight advantage for intel.

i know that the gpu doesn’t influence on rendering, however he has a ATI SAPPHIRE RADEON HD 5770 1GB GDDR5, while i have engtx 250 1gb. could it be an explaination for that 3 minutes of difference?
Im not a benchmark addicted, i just want to understand if my system has some problem!

thanks!

lsscpp · July 31, 2010, 6:29am

I red somewhere that the more you subdivide in parts the more you get some latency somewhere in the process, until it gets unefficient in terms of saving time. Can anyone confirm this?
I like the way internal Cinema4d approaches this issue: it scans horizontally the area sudividing it by the number of cores used to render: whenever a core has reached the end of its portion, one of the remaining areas gets subdivided and the free core starts the new job. Recursively until the render is done.

mzungu · July 31, 2010, 6:40am

A lot of this difference is due to the render engine improvements of the 2.5x version of blender. To push this further, visit graphicall.org and download an “optimized” build that supports your system. For some other pretty sweet rendering goodness (of the GI sort), look there for a “render branch” compilation (and search here for more info on its BxDF beautiousness. )

Same exact scene? Same exact version of blender? Same OS? Don’t know why they should be that far off. Its definitely not the video card, any other differences in system hardware would have negligible effect. Strange.

jrboddie1 · July 31, 2010, 7:36am

I found that there is significant render speed variability in the builds on graphicall. The recent BF 2.5.3 (2.5 Beta) build renders one of my scenes about 20 percent faster than some graphicall builds.

Richard_Culver · July 31, 2010, 8:35am

I don’t believe that is true. In LightWave the setting for multithread is separate from sectional rendering. You can render one section multithreaded AFAIK.

Felix_Kutt · July 31, 2010, 8:43am

I’m pretty sure none of the BxDF beauty is in even the render branch yet…
But yeah it’s pretty awesome anyways.

Bernardo · July 31, 2010, 9:49am

yes the settings and version of blender and OS should be the same (at least from what he told me) but i also cannot understand this difference. about the graphic all versions i knew, thanks but this was a confrontation of system power. for sure i will use some graphic all improved version for the “serious” stuff!

about the rendering techinque, i remember that for example terragen looks like is rendering pixel by pixel… could that be?

however a system to subdivide further the last part of an image to use full cpu power i think could be really useful.
i think in the image im talking about could save 30-40% of the total rendering time.

aermartin · July 31, 2010, 12:40pm

… ocl the renderer , if you have a ati you’ll see 1600 tiles and if you have a nvidia 480. swooooosh.

oooh the future!

cekuhnen · July 31, 2010, 1:34pm

some of the engines I used can also subdivide one tile when other threads are empty to speed up the rendering. I am not sure if I saw that happen with Blender.

Often I just then set the tiles to a smaller size so chances are lower that a bigger tile is working on
a slow part of the image alone.