A question about the Blender benchmark data

I noticed the scores are getting smaller and smaller with every new blender version even though new rendering improvements are being worked on with almost every release. Looking at the 4090 benchmark results it seems like either the benchmark methodology have changed or something is wrong somewhere (all results were using OPTIX as a backend):

  • 3.2: 9859.17 (from 41 benchmarks)
  • 3.3: 12168.3 (from 707 benchmarks)
  • 3.4: 12637.5 (from 1349 benchmarks)
  • 3.5: 13117.71 (from 1000 benchmarks)
  • 3.6: 13065.08 (from 1440 benchmarks)
  • 4.0: 11296.15 (from 1116 benchmarks)
  • 4.1: 11293.17 (from 468 benchmarks)
  • 4.2: 10884.93 (from 610 benchmarks)

Anybody have an explanation ?








The Blender benchmark Score is a measure of how quickly Cycles can render path tracing samples on one CPU or GPU device.The higher the number, the better. In particular it’s the estimated number of samples per minute, summed for all benchmark scenes.

But recent rendering improvements are not always about path tracing sampling.
Generally, when improvements, specific to a release, are about an OS, a trademark of hardware acceleration, a denoiser ; the following releases are about other ones.
In 3.4, there was introduction of path guiding.
In 3.5/3.6, there was improvement of light sampling and adaptive sampling.

Since 4.0, there are quality improvements. A new Principled shader, with Multiscattering GGX as default and thin film support. More accurate hair shaders.
In 4.1, performance improvement was about OpenImage Denoiser.
in 4.2, gains from Blue Noise sampling are limited to low samples renders.

So, those numbers are looking coherent to performance gains on sampling, made at end of 3.x series, before a research of costly quality improvements of defaults, compensated by performance improvements on denoising, added in 4.x series.
4.2 is not far from 4.1 and 4.0. It is above than 3.2. Maybe 4.5 will be above 3.5.
Maybe Benchmark would be rethought for Blender 5.