PSA: stop using Blenchmark (and adjust your tile sizes)

BeerBaron · July 28, 2017, 3:53am

I keep having to write this in various threads, so to raise public awareness, I’m going to create an extra thread for this:

All GPU results on blenchmark.com are useless

In order to achieve optimal performance, tile sizes for the GPU must be adjusted to 256x256 or more, otherwise all but the smallest GPUs will be underutilized. This is a known issue in Blenchmark which hasn’t been fixed for many months (and a known issue in Blender that has persisted for years).

Another issue is the usage of spatial splits, which slightly improves render performance but significantly increases startup time by several seconds. This part of the benchmark is entirely limited by single-threaded CPU performance.

The BMW benchmark in general does not reflect the performance characteristics of various kinds of scenes on GPUs and CPUs. The BI provides a suite of benchmark files and their own test results. For user benchmarks, refer to the comments section.

YAFU · July 28, 2017, 4:59am

Thank you. I will point to this thread instead of having to explain it every time when a user talks about Blenchmark.

I hope the guy in charge of Blenchmark will develop a new version correcting these errors.

Grzesiek · July 28, 2017, 6:10am

Agree on the tile size, but results based on blender version is there. Split by OS.

As for the BMW benchmark it self. At the time of original release it was at least something. Now I agree that it does not reflect all aspects of Blender, but that is not possible. But at least it shows a refrence of one component (GPU/CPU) to another one. That is still a valid point.

Still I agree that maybe they should simply use the blenders set of scenes (or at least one of the scenes) to show current possibilities.

anon12133251 · July 28, 2017, 7:15am

I still don’t understand why anyone should have to adjust any tile sizes at all, GPU/CPU.

All the tutorials I watch they are always switching tile sizes, why doesn’t everyone just enable auto tile size and let the built in add-on do all that for you?

Does it work correctly?

BeerBaron · July 28, 2017, 8:05am

Fair point, it’s in the detailed view. I didn’t pay attention to that. Edited.

Auto Tile Sizes makes sure the frame is split evenly, but it can still result in tile sizes that are too small, depending on settings.

The point is to pay attention to tile sizes. I keep mentioning 256x256 but it’s not an optimal size either, it’s just about large enough and it’s a nice power of two. In fact, it’s probably too small for the GPUs at the highest end. What’s really optimal depends on the scene.

anon12133251 · July 28, 2017, 9:02am

Interesting, I tried rendering 5 completely different scenes with auto tile size enabled and disabled.

Every single time turning off the auto tile size and changing GPU size to 256x256 ending up being about 10 seconds faster than with auto tile size.

Thanks for pointing this out, I just always assumed auto tile size was choosing what is best for my scene.

Indy_logic · July 28, 2017, 10:04am

Sorry, ignore me.

Thesonofhendrix · July 28, 2017, 10:42am

BeerBaron":

I keep having to write this in various threads, so to raise public awareness, I’m going to create an extra thread for this:

All GPU results on blenchmark.com are useless

In order to achieve optimal performance, tile sizes for the GPU must be adjusted to 256x256 or more, otherwise all but the smallest GPUs will be underutilized. This is a known issue in Blenchmark which hasn’t been fixed for many months (and a known issue in Blender that has persisted for years).

Another issue is the usage of spatial splits, which slightly improves render performance but significantly increases startup time by several seconds. This part of the benchmark is entirely limited by single-threaded CPU performance.

The BMW benchmark in general does not reflect the performance characteristics of various kinds of scenes on GPUs and CPUs. The BI provides a suite of benchmark files and their own test results. For user benchmarks, refer to the comments section.

Ok, so looking at Blenchmark, am i wrong in concluding that a 1080ti is only very slightly faster than what a 16core threadripper will be?

BeerBaron · July 28, 2017, 11:05am

In general, you are wrong to conclude anything from a false premise. The blenchmark results are wrong, they don’t represent the capabilities of the GPU.

Grzesiek · July 31, 2017, 7:06am

Agreed with BeerBaron.

I’d equate Blendermark results to Blender as 3DMark results to actual games. There isn’t any direct correlation of actual performance you’d get, but it does serve to give you a basic “overview” of performance at times.

Still specviewperf which also now includes Blender as part of the set, and the fact that results can not be altered, would give a bit more value? Or am I wrong on that? But also I think it is just CPU?

With regards to tile sized, issue for me is that how to best use 2-3 GPU’s I have to set tile size to 256x256 so that each GPU can render “close” to max. I know 512x512 would be better, for single GPU. Still isn’t the upcoming 2.79 going to resolve tile size issue?

shawn.kearney · July 31, 2017, 7:43am

Ok. But I am assuming then that small tile size impacts large cards disproportionate to more modest ones, would a larger tile size disproportionately favor larger GPUs?

Saying that we should change the tile size to better suit larger cards is like saying we should benchmark all CPUs as if we’re rendering on Threadripper, Epyc or x200

Seems to me that for the purpose of comparison between cards a middle-of-the-road standard would be best suited.

shawn.kearney · July 31, 2017, 7:50am

Or, would a better solution would be to render a standard scene using a variety of different tile sizes and averaging the results?

The median and/or mean of per-pixel render time would probably be the best solution of all.

BeerBaron · July 31, 2017, 9:48am

I’m not sure I understand what you’re saying here. I’ve tried Blender in SPECViewPerf, it’s testing the OpenGL viewport performance.

I don’t think it would. I suppose it might cause display driver timeouts, on really small GPUs.

Saying that we should change the tile size to better suit larger cards is like saying we should benchmark all CPUs as if we’re rendering on Threadripper, Epyc or x200

One size does not fit all. On CPUs, you want small tiles, for cache coherency. The number of CPU cores isn’t relevant to tile size.

Seems to me that for the purpose of comparison between cards a middle-of-the-road standard would be best suited.

If tile sizes are too small, the larger GPUs are simply starved for work. A “middle of the road” that is too small misrepresents the capabilities of those GPUs. Usually there is no reason for the user not to use larger tile sizes, so a representative benchmark should take that into account.

Or, would a better solution would be to render a standard scene using a variety of different tile sizes and averaging the results?

The median and/or mean of per-pixel render time would probably be the best solution of all.

No, the best result should be picked, because it represents the achievable performance of the hardware.

Grzesiek · August 18, 2017, 5:38am

thanks for clarification on SPECViewPerf, though it was rendering not just openGL.

As for the overall issue of using this Blenchmark, what would be the best approach to test?

I still don’t expect many blender users to know the tile issue.

in a way I wish that was just automatic. when Render is set to CPU tiles are automatically set to 16x16, and when GPU is selected then set to 256x256. I know that for GPU’s especially other settings will give you even more performance (like setting one tile to match render resolution. But at least the above would give majority of performance to most users.

But also on the tile part, didn’t recent OpenCL patch change the approach so that tiles no longer make such a difference when rendering on GPU?

Too bad that the official benchmark files site does not have the same level of usability as Blenchmark.

moony · August 18, 2017, 6:59am

Auto tile size addon doesn’t give me optimal tile size for my GPU.

The default GPU auto tile size is set to 256 - which on the BMW benchmark scene gives me tiles of 240x180 (12 tiles), but my 980Ti renders much faster with a tile size set to 512, which gives a tile size of 480x270 (4 tiles).

Render times are 1:06 and 0:57 respectively - so the 512 setting is around 15% faster.

3DLuver · August 18, 2017, 8:08am

Yeah more useful for people would be a list of optimized tile sizes for each type of GPU and mem size, Devs did state a while back after Opencl split kernel work that tile size should no longer effect performance largly and that 128x128 was generaly optimal.

Yet on my AMD Fire Pro W9100 16 Gb GDDR5 ECC my card renders 4-5 times faster at full render res (e.g 1 tile) than 128x128 or 256x256.

Best way is just set samples low like 128 and just do a few render tests at different tile sizes for a selected portion of the camera view and use those results to find your sweet spot.

Grzesiek · August 18, 2017, 12:46pm

Good point 3DLuver, I just hope for the general blender comunity that this will be somehow integrated. wasting that much performance especially when rendering a long sequence is painful to think off.

FYI, nice GPU

LordOdin · August 18, 2017, 3:35pm

Doesn’t the blenchmark scene still have post processing on it?

Lets say you have a 8 year old i7 which you think is just fine (Single threaded performance is less than half of current generations i7s) and you thought it would be a good idea to get a 1080ti

The older i7 will take many many more seconds than any current gen CPU AMD/Intel for the BVH and synchronization of the scene… but your 1080 ti render times will be the same roughly.

Example
Older i7 12 seconds to build BVH and synchronize
Modern i7 6 seconds to build BVH and synchronize

Older i7 6 seconds to do post processing
Modern i7 3 seconds to do post processing

1080 ti Render time is 10 seconds

1070 render time is 20

old i7 + 1080 ti = 28 seconds
new i7 + 1080 ti = 19 seconds

old i7 + 1070 = 38 seconds
new i7 + 1070 = 29 seconds

You could basically get the same time with a 1070 on a new i7 as you can with an old i7 with a 1080 ti for a fast but dense renders

People dont really realize HDD speed also plays a huge part in synchronizing times when using many textures… I have scenes 3500+ image textures on HDD it takes over 6 Minutes for the render to start and on SSD we are pulling files fast enough to pin my 12 thread i7 5820k at 4.5 Ghz at 100% so the storage is no longer the bottleneck and the render starts in less than 2 minutes

doublebishop · August 18, 2017, 7:17pm

This is why persistent data / images is very important, especially for animations.

Ideally, benchmarks should do the following (this is how we test gpus and configurations)

is a 1 sample 4x4 pixel render of the image with persistent data enabled… this will load textures / the blend file fully into memory ready for the proper testing.
render the 1 sample 4x4 pixel renders 3 times and average it to get the bvh / post processing times.
render the image at full samples and at full res and average it, then subtract the previous time to get a true benchmark of the GPU.

For our internal benchmarks, we do it over a variety of production scenes, some which are very texture heavy, some which are very mesh heavy, and finally some which are very particle / movement heavy.

LordOdin · August 18, 2017, 10:47pm

doublebishop:

This is why persistent data / images is very important, especially for animations.

Ideally, benchmarks should do the following (this is how we test gpus and configurations)

is a 1 sample 4x4 pixel render of the image with persistent data enabled… this will load textures / the blend file fully into memory ready for the proper testing.

render the 1 sample 4x4 pixel renders 3 times and average it to get the bvh / post processing times.

render the image at full samples and at full res and average it, then subtract the previous time to get a true benchmark of the GPU.

For our internal benchmarks, we do it over a variety of production scenes, some which are very texture heavy, some which are very mesh heavy, and finally some which are very particle / movement heavy.

Our internal persistent data button makes everything in cycles persistent not just textures. It’s such an amazing feature but it’s broken at the moment haha I really hope it’s fixed in the next update. Ao passes for 12gb of geometry is slow without it