GPU accelerated Geometry nodes and physics?

Hi!

I’m not at all insightful into any of this, toying around with Blender for a few years now and my programming knowledge is very limited.

I recently upgraded my PC specifically so the modelling in Blender would move faster. And while performance results are very significant for my workflow, I was wondering, if it’s possible for Blender in the future to get it’s geometry nodes at least partically accelerated by GPU instead of CPU only? I think potential for some kind of implementation is there. I’m just wondering what’s the biggest obstacle for doing so. Some prerequisites needed first, purely time/funding for making this happen or are there simply different priorities instead?

The same way I would like to spark a discussion on possible GPU accelerated physics. Apart from rendering, it seems Blender can utilize CUDA and similar GPU oriented languages already to some extent, although for now only via 3rd party addons. Terrain Nodes for example uses CUDA and also works great on my AMD Radeon 7900 xtx thanks to ZLUDA. And Michal here is using CUDA for his GPU accelerated Fire and Smoke Simulator.

I’d like to get to know more regarding Blender development in this area. If anyone has some general knowledge and wants to share it with me/us, please, go ahead!

It’s been mentioned in passing once or twice, but I have never seen any concrete plans for a geonodes GPU backend. It’s safe to assume it is not coming anytime soon. For more info I guess you could pop on blender.chat and ask the developers whether or not that’s on the horizon.

As an interesting piece of anecdata, the developers of Manifold, a high-performance boolean library, originally had CUDA as an implementation strategy, but they don’t any more, as they weren’t getting any speedup compared to using TBB on modern CPUs with many cores. There’s a lot of complexity and data transfer time taken trying to get these kind of things working on a GPU. Not to say it might not be worth it sometimes, but it is less of a slam dunk than you might think.

9 Likes

Interesting. I want to by no means to complain. I’m excited by what I already get. Just this thought came to me when I was looking at my 16c 7950x going 100%, then to drop to 10% just so it would go 100% again a few seconds later. So I thought “hey, maybe there’s a room for GPU to handle something on a side, if not take over majority of tasks it potentially could”.

Having some kind of hybrid system where GPU could carry CPU where it lacks and vise versa sounds amazing to me. Especially in consumer grade PCs like mine, or even slower (your average gaming rig for occasional Blending).

Maybe some day. Still, Blender rules! :raised_hands:

Mixing CPU and GPU implementation like that doesn’t work well because uploading data to the GPU and fetching it after the computation takes a long time.
So long in fact that just running on the CPU can be faster.

The other issue is that GPUs are specifically designed for tasks that can be implemented “extremely parallel”, like matrix multiplication. Any computation that is more general, needs branching, random memory access, etc. is usually just as fast or even faster to compute on the CPU.

So to summarize: utilizing the GPU is beneficial if 1) you can ensure that the data stays on the GPU for the entire computation and 2) the task can be implemented in a GPU friendly manner.

Looking at how many geometry nodes there are and how complicated fields can get, I don’t think it’s worth looking into. Maybe if there is a good way of not having to implement a CPU and GPU implementation for everything…

6 Likes

There was an attempt (?) to use more GPU (also for GN)

1 Like

This includes Cycles, EEVEE, Workbench, Hydra and the Compositor.

The performance tests for geometry nodes run on the CPU.

1 Like

The GPU is incredibly fast at certain tasks like rendering and game graphics, but the reason for that is because the GPU as a piece of hardware is more specialized. Core types that are specifically geared for something will always be fast at some tasks at the cost of being unsuitable for other tasks.

The CPU meanwhile is a general-purpose computing component, it can do anything you can set your mind to, but the tradeoff is that it is not ideal for certain tasks (for instance, you do not see it as the driver for realtime rendering).

Now you mention the ownership of a 16 core 32 thread processor, that is great for Cycles rendering with the RAM amount it can access, but that kind of processor is still a bit of a niche since the vast majority of tasks cannot use that many cores effectively (even in cases where it looks like it is at full load because of the cores interfering with each other and slowing everything down). If you are dependent enough on rendering performance to justify the extra cost then that is fine, but for others sticking with 8 cores and 16 threads may be ideal, especially with the fact that performance improvements these days has become heavily dependent on updating hardware designs to handle increased wattage and heat output.

To note, a good example of less cores used is better is Geometry Nodes, there was recently an addition to Blender’s own BLI multithreading library that allows the capping of how many cores can work on a certain task (which believe it or not means faster evaluation of a tree than simply throwing all the cores at it).

1 Like

For general Geometry nodes I suspect it’s not really an option, for the reasons stated.

For physics on the other hand and in turn one assumes simulation, it maybe a different story.

As you already pointed too, particle physics systems for stuff like fire/smoke and liquid can totally be GPU accelerated (to the point of real-time). Not just in the case of Blender, but things like EmberGen, etc.

I’m less sure about physics simulation for stuff like Cloth or Hair. I mean fully realistic, whole character, real-time cloth and hair sim would be amazing, but since at this stage I don’t even think Blender has a plan/design for a updated cloth/hair system and no one else seems to want to make an addon for such (GPU accelerated or otherwise), I have doubts we will ever find out.

I think this makes most sense in the context of baking, say, a particle simulation, where you’d need a single round trip to and from the GPU?

I think it’s worth considering Houdini and how they faced the same issue. This post is from 2012, but things have not changed much as of now. Specifically:

Fastest GPU speeds are achieved by avoiding any copying to and from the CPU memory. Well, you can’t have zero copying, or you have nothing useful, but you want to minimize the transfer. This is why the attached file:

  1. Turns off DOPs caching. Caching requires copying from GPU to CPU all the fields every frame. Very useful if you want to scrub and inspect random fields, not so useful if you want maximum speed.
  2. Only imports density to SOPs. Only one field needs to be pulled from the GPU to CPU each frame.
  3. Saves to disk in background. This gives you the best throughput. Displaying in the viewport requires a GPU → CPU → GPU round trip. (Yes, ugly, but likely required in general to support simming on a card other than your display card)
  4. Uses a plain smokesolver. All the code paths being used after the first frame sourcing are GPU enabled. If you add a microsolver that isn’t GPU enabled, there is no error. Instead Houdini just silently does the GPU->CPU transfer for you.

And:

While we regret that the “Use OpenCL” toggle isn’t at turn key “Make things blazingly fast” toggle, we do stand by the significant improvements you can realize if you optimize your scene around the GPU.

More solvers have the options of using GPU now, but the points raised there stand still now.

Fundamentally:

  • transferring between CPU and GPU is the bottleneck
  • you can rely exclusively on GPU (Embergen), but you have to build your whole architecture around it and you might run into hardware limitations with complex simulations, given how memory limited GPU are. If these limitations are not a problem for you, it can be potentially fantastically fast (Embergen is magic).
1 Like

I’ve been using Ansys Discovery in engineering for a while and it uses the GPU for fluid simulation, so it is definitely possible but I’m sure not easy as they’ve been working on it for years.

It is however, absolutely astounding and one of my very favourite things to use. Sure would be nice to see fluid and particle sims GPU accelerated in Blender, that’s one of the best applications. I’m sure it takes a massive effort but hopefully one day…

4 Likes