Optimization: Is there a way to see which nodes are multithreaded?

I do a lot in geometry nodes. And always looking for ways how to optimize more. I recently started to mimic workflows very similar to those used by major game dev studios like Ubisoft, DICE or Rockstar, to create open worlds. It’s not Houdini, but you can definitely archieve a lot in Blender too.

What makes me rubber necking for Houdini these days is multithreading support. To my understanding, you can leverage multicore CPUs much more in Houdini than in Blender, for non rendering related tasks. I often time find myself looking at Task manager to see my 7950x running on 10% for noticeable part of the total processing time, which is bit of a let down.

Now, I know it’s bit simple logic here, but is there a way to see or understand which nodes are prone to use only a small portion of CPU processing power and which ones use all of it? I find myself quite often choosing between various ways how to process mesh and weighing trade offs, one of which being how fast it’s gonna run.

If anyone can add to this discussion, I’ll appreciate it. I’d love to learn more!

None of them are multi-threaded

1 Like

Hum, I thought it was the opposite, I also think that GN tries it’s best to run computations in parallel.
I would be super grateful if someone is willing to shed some light about how GN works in that matter !

Here is what I know : optimizations are a key point in GN development, a lot of decisions and improvements are made to favor performance even when processing a lot of geometry.
The only thing that can slow things down is the repeat zone that in fact can’t run it’s computation in parallel…
But it’s true that we don’t have control over how things are processed…

If optimization is really a concern you should definitely run some tests when building your projects, it’s hard to come up with universal rules… Basically you should see what is computation heavy in your trees and replace these parts with faster algorithms… Some nodes are heavy by default like raycast or booleans, but it also really goes down to how you’re using them…

Also, it might be worth splitting your scene in several objects so each objects can be evaluated in parallel. I suspect that say 10 objects with their own modifiers would probably be evaluated faster than a giant nodetree containing these 10 elements altogether… But I never took the time to investigate that indeed !

Lastly, I would also separate what is evaluated once and what is moving.
If you don’t have a scene time node or evaluate something animated then the modifier will be evaluated only once, therefore the execution time of the tree isn’t that much of a concern since it will be build when you open the blend or change a parameter but the scene will stay smooth otherwise.

Everything that is time dependent should be put on it’s own object and highly optimized if viewport playback is a concern…

Keep in mind that we also have the bake node now that can help to avoid extra re-computations or to bake initial parts of the tree…

Now that eevee support displacement some stuff can be moved to shader instead of geometry node: it’s very likely that folliage or grass can be animated in the shader rather than in geometry node so you can keep great FPS and evaluate that geometry only once…

Finally keep in mind that multithreading isn’t always the solution, sometime it’s faster to run every computation at once when they are very fast to compute rather than splitting them and waiting of every thread to finish…

To conclude, here is a bit of stats with my procedural castle scene…
The whole scene takes ~19s to build (looking at the evaluation time of the modifiers) , keep in mind that everything is generated from GN there isn’t any geometry except the basemeshes that are basically 4 cubes.

You can see the polycount on the screen capture :

I also added a few closer view so it’s easier to see the level of detail and how much geometry is processed :

This is how my system monitor looks like during the generation :

At this stage, I’m not that much annoyed by the time it takes to generate the mesh, which I found quite nice especially since I didn’t tried to make the scene super optimized either, but rather finding a good balance so I can work. But in fact it’s more the viewport that is getting pretty slow, probably because of the 500 000 instances lying around…

But that’s just me :slight_smile: To circumvent that issue I have a preview mode with simpler geometry that I can use to layout the scene and switch to final for renders…

10 Likes

:heart_eyes:
For the images alone…

2 Likes

This commit has more information on it. Geometry nodes are indeed multithreaded, but some operations have to be executed in consecutive order.

5 Likes

Sorry but that’s just plain false, just picking a randomly chosen set of release notes and reading them, could have told you that.

8 Likes

To me, the biggest slowdowns in my scenes come from the dependency graph evaluations. You need to make sure that your node graphs and objects are not referencing each other frequently.

I think that some of the Blender nodes are actually faster than Houdini’s, although I have not used the recent versions of Houdini.

Yes, I often use Timing option for various processes. It’s super helpful. Also, when working with complex node trees, it comes handy to group many nodes together (Ctrl+J) to get total processing time for all of them at once, rather than to be looking for that “weakest link” one by one.

Using your own preview node tree is also something I do very often. To represent the basic shapes and sometimes even simplified UV maps where needed.

I do use repeat zone excessively, have to say. It’s so helpful for things I do! This is one of those moments where I gladly exchange longer processing time for precision. For example where creating 3D decals for instanced trees, rocks etc. Evaluating each instance against the nearest terrain mesh and assigning material to that decal based on nearest ground material (grass to grass, mud to mud etc.), all while taking into consideration various artefacts that can appear and fixing them for the flawless result or all instanced in the scene. I don’t think I would be able to make this without repeat zone. Maybe I’ll get smarter in the future, who knows. But just because something works doesn’t mean I’ll stop looking for how to make it even better/faster :slight_smile:

Some nodes are heavy by default like raycast or booleans, but it also really goes down to how you’re using them…

Oh, yes. I use booleans often too, but also try to give them as less as possible to crunch, when I do. Cut unnecessary stuff out first.

Keep in mind that we also have the bake node now that can help to avoid extra re-computations or to bake initial parts of the tree…

If there’s something I love even more than repeat zone, it’s baking geometry node! This is absolute game changer for me. I have node trees that can take up to 20 minutes to recalculate in total, so having that workload separated to 5-7 smaller ones that can be baked sequentially is sanity saver.

P.S. I love your castle :sunglasses:

3 Likes

Cool, thanks !

Indeed optimizations makes complex things even more tricky… It’s already a great achievement to manage something that works but is long to compute since the road to make it faster can be pretty much more involved !

20mn of computation seems a lot but if that doesn’t prevent you from being creative and if you can manage to do what you want in the end then it’s probably fine.

I guess it’s probably better to look at very simple problems to start thinking in terms of parallel process and get rid of repeat zones. I do use them too once in a while, but when you have a lot of iterations you might start to feel the slowdowns.
Many times it’s possible to avoid repeat zones but it’s true that it makes things more difficult to mentally represent, as we are generally not used to think in parallel processes…

Anyway, practice makes perfect, if you keep working you’ll eventually figure out better ways !

Have fun !

1 Like

If algorithm can be parallel it’s parallel, if there is no way to do that, node is single thread.