How well does CPU+GPU scale?

So I have access to some pretty outrageous instances where I work, I think the largest I could theoretically configure is 94 cpu cores with 8 A40 GPUS … which probably won’t ever happen because it costs a lot, but this got me thinking about scale and price-performance.

I know that CPU scales pretty linearly, but this CPU+GPU world is pretty new to me. From your experience, how well would you say CPU+GPU scales? Like I know that just doubling up resources probably won’t halve render times, but are these big instances something worth looking into - or is there a point of diminished returns?

As much as I’d love to play with 94c/8 gpu machine, I can only justify even asking to turn one on if it makes sense.

From what I’ve heard, CPU+GPU has essentially no improvements over just CPU rendering. The fastest option, at least for Blender, is GPU only

1 Like

Interesting. Seems like worst of both worlds then.

I’m not 100% sure though, I ran a few tests just now and noticed that my render felt slow and checked my prefs and found that for whatever reason my compute device was left unselected, once I enabled it I was seeing the speed I expected. For reference, I’m running 24c/A4000, iirc my machine is configured for Xeon v4; not sure how fast.

I will try gpu- only this weekend.

1 Like

CPU+GPU is slower than just GPU but faster than only CPU.
An A40 should be similar to a RTX3090.

So in Cycles select Optix render for the GPU’s. You don’t need to engage the CPU’s
unless you want to benchmark it and add it to Blender database :smile:
It would awesome what kind of performance the 94 could achieve but it will be way worse than the 8 GPU.

Download the benchmark here, with your power will probably finish in less than a minute.

Note that database shows that 2x Epyc 64 core CPU so 128 core can only achieve 2562 score that is less than my laptop RTX 3060 2615 score. The new RTX4090 surpasses 12000 score…

Edit: i see there are a small sample number of A40 in database results, they have a rather low comparative score of 4500 vs RTX3090, wonder if that number was due to not optimised drivers or other issue.

I do not think our cloud provider has any Laplace GPUs yet, but there’s a few other options I can look into as well… But thanks for reminding me about the benchmarks! This will be very helpful.

The thing about CPU is that it’s really cheap. I can spin up a lot of cores for the price of a high performance GPU, so there’s a price-performance consideration. If a single CPU instance costs 1/5th that of a single GPU instance but performs only two times faster, it might be better to use five CPUs instead. This whole cloud thing totally makes me rethink compute performance. Same would go for less beefy GPUs, especially considering a lot of that cost is going into vram.

Well, it’s far from an extensive test (I don’t have access to the sort of hardware you do, tho I wish I did), but only a couple days ago I posted some benchmarks on GPU rendering, comparing my old and new GPU.

I only did a CPU comparison for the BMW scene, because, well it takes a fair bit of time. So not only does the CPU instance need to cost a hell of a lot less, it needs to be pretty power efficient as well.

Really, the only time CPU rendering starts to make sense, is if what needs to be rendered consumes a LOT of RAM and the GPU’s with enough VRAM to render it are just beyond silly in price.

As to answer the hybrid question, unless something has change, using both CPU+GPU to render is generally a bad idea. The CPU just slows the GPU down too much that it is faster to just let the GPU do it all.

Benchmark says that one Epyc 64 core have 1/10 performance of a RTX4090. From what i have checked that CPU is more expensive* than a 4090. Plus it will take 10 more time rendering which means it will have to consume 1/10 of energy per unit of time to be even in same energy efficiency level.

And i am not giving value to time.

Bare in mind we’re using a cloud vendor. The CPU instances are much, much cheaper, though it remains unclear what the price-performance actually looks like.

1 Like

You’re going to have to be the pioneer with this one. We don’t have the same access as you do, so the best way for all of us to find out is for you to do it.

Start small, track your data and plug it into a spreadsheet. Also, know your workload. If you are rendering 10000 frame animations at 100 spp, that’s a much different workload than a 100 frame animation at 10000 spp

Design a test that fits the work you are planning on doing and test it at a variety of scales.


The GPU in this case is being used, not as a graphic display device, but as an “array coprocessor.” The on-chip hardware is “physically parallel,” in the sense that it is capable of performing many mathematical operations “literally, simultaneously.” Whereas a CPU, with certain exceptions, processes one value at a time albeit in a “pipeline.” Therefore, the relative performance between these two approaches depends very much on exactly what hardware you are speaking of, making a “categorical answer” irrelevant.