using of multicores in blender rendering

hello!

i hope this is the right section for this!

i have a question about the usage of cpu in blender rendering.
i have an i5 750 cpu, so 4 cores, 4 threads.

when i render an image i can see the 4 cores fully used, but at the end when it start to render the last part of the image it uses only one of them. i noticed the problem in one image that has a “really heavy to compute last part” and to render only this part takes 10 minutes, about the half of the total render time. and for all this 10 minutes blender uses only 1 of the 4 cores of the cpu.

i have this with both 2.49b and 2.5x
is it normal? i have the threads set to 4 (and is the same if i leave the option on auto).

i had a similar issue with physics simulations, in which the usage of the cpu is around 40% all time long.

does anyone know how to solve this? or it is like this and nothing can be done?

thank for any help!

The render is split into smaller tiles (you can set the X and Y numbers in the render settings). Each tile is only rendered by one core (the total render is multithreaded but the individual tiles are not). If you have one section that takes a long time to render you’ll be waiting for it to finish. Try increasing the number of total tiles so it evens out the render. Ideally you want all your cores to be working 100% of the time. You should hit an optimum number that will give you the best render time. This will vary for each image.

Parts Rendering wiki entry: http://wiki.blender.org/index.php/Doc:Manual/Render/Options#Parts_Rendering

AFAIK, only the actual render engine (BI) is multi-thread capable. What you are seeing with the drop to one core is when blender’s compositor (or whatever) is compiling the final image output (putting layers, or whatever, back together.) Likewise with physics calculations, not multi-threaded (yet?) :slight_smile:

thank you for the answer… i already tried this way and i passed from 4x4 to 8x8 with a couple of minutes less (out of 20) then i tried to increase further but looks like the situation didn’t get better.

so however every single part cannot be computed by more than one thread…mmmm that’s not good… it looks like this one single part is more hard than the rest of the scene! the problem is higher in this particular render of course because of the difference of complexity in parts of the scene, but still wouldn’t be good to be able to compute every pixel of the image with all the power avaiable?

however, just to know, is there a limit to image subdivision?

sorry mzungu i saw your reply after posting.

the problem is that i see the drop to one core only usage when the image is still rendering (one part is black and slowly appearing). so i think in this case it has nothing to do with the compositor or layer or things like that…

You are really asking for fine-grained parallelism instead of the coarse-grained one in use at large. This situation is not likely to change anywhere soon in the industry.

though one could make a (relatively) “quick” hack of further dividing that last tile and distribute them to the idle cores…

I think all renderers uses only tile technique for multicores.

namekuseijin i didn’t understand good what is the “quick” hack… you mean just increase the dividing rate of the image? for istance from 4x4 to 8x8 or 16x16? or something else?

however the maximum i could reach was 128x60 but then during the rendering was written that the parts where 130… so something was wrong. however the last part is still really long… but if that’s the way it works, what to do :slight_smile:

a developing idea: divide the last part

how to do that?..

a “quick” hack for a programmer, that is. :stuck_out_tongue:

in the case of a user, more tiles still is the best option.

ok so then that’s what i did. in the 2.53 i could reach 512x512 (even if in the rendering window is writte 130 parts…how’s that?). however in the 2.53 the problem with this rendering is not so big since the image, that in 2.49b took 22 minutes, here takes about 3:30 :smiley: therefore the time loss for the single core part image calculation is really small.

since im here i would like to ask another question, that is the reason why all this started. one friend on a forum said he rendered this image in 19 minutes and he has got a AMD PHENOM II X4 965 and 4gb ram, while me with i5 750 and 4gb i render in 22 minutes. is normal this difference? from the benchmark i found online looks like the cpu’s are more or less the same with some slight advantage for intel.

i know that the gpu doesn’t influence on rendering, however he has a ATI SAPPHIRE RADEON HD 5770 1GB GDDR5, while i have engtx 250 1gb. could it be an explaination for that 3 minutes of difference?
Im not a benchmark addicted, i just want to understand if my system has some problem! :slight_smile:

thanks!

I red somewhere that the more you subdivide in parts the more you get some latency somewhere in the process, until it gets unefficient in terms of saving time. Can anyone confirm this?
I like the way internal Cinema4d approaches this issue: it scans horizontally the area sudividing it by the number of cores used to render: whenever a core has reached the end of its portion, one of the remaining areas gets subdivided and the free core starts the new job. Recursively until the render is done.

A lot of this difference is due to the render engine improvements of the 2.5x version of blender. To push this further, visit graphicall.org and download an “optimized” build that supports your system. For some other pretty sweet rendering goodness (of the GI sort), look there for a “render branch” compilation (and search here for more info on its BxDF beautiousness. :smiley: )

Same exact scene? Same exact version of blender? Same OS? Don’t know why they should be that far off. Its definitely not the video card, any other differences in system hardware would have negligible effect. Strange.

I found that there is significant render speed variability in the builds on graphicall. The recent BF 2.5.3 (2.5 Beta) build renders one of my scenes about 20 percent faster than some graphicall builds.

I don’t believe that is true. In LightWave the setting for multithread is separate from sectional rendering. You can render one section multithreaded AFAIK.

I’m pretty sure none of the BxDF beauty is in even the render branch yet… :confused:
But yeah it’s pretty awesome anyways. :smiley:

yes the settings and version of blender and OS should be the same (at least from what he told me) but i also cannot understand this difference. about the graphic all versions i knew, thanks :slight_smile: but this was a confrontation of system power. for sure i will use some graphic all improved version for the “serious” stuff! :smiley:

about the rendering techinque, i remember that for example terragen looks like is rendering pixel by pixel… could that be?

however a system to subdivide further the last part of an image to use full cpu power i think could be really useful.
i think in the image im talking about could save 30-40% of the total rendering time.

… ocl the renderer , if you have a ati you’ll see 1600 tiles :stuck_out_tongue_winking_eye: and if you have a nvidia 480. swooooosh.

oooh the future!

some of the engines I used can also subdivide one tile when other threads are empty to speed up the rendering. I am not sure if I saw that happen with Blender.

Often I just then set the tiles to a smaller size so chances are lower that a bigger tile is working on
a slow part of the image alone.