[Dev] Pre Z Optimization in the BGE (testers needed!)

I have implemented an optimization in the BGE known as Pre Z, or depth pre-pass, and it is a method used to reduce overdraw. In one test scene I got about a 60% increase in FPS. More complex scenes will benifet more from this since they have a better chance of getting slown down by overdraw. However, I did not notice a decrease in perfromance with the default cube scene.

Known limitations:

  • Only works when using GLSL materials. I need to figure out of I just want to disable the optimizaiton for Multitexture and Singletexture, or if I want to try to get those modes working. They probably wouldn’t benefit much since they are not running fragment shaders.
  • All objects must have a material. Objects without materials will either have z-fighting issues or have weird depth issues. I am still looking into solving this issue.
  • Custom vertex shaders will not work with this since they will result in geometry having different depth values than those used for the pre pass. I might just disable objects with custom shaders from the pre pass so they won’t break in the regular pass.

What I’m looking for:
I need some feedback from others on how this optimization affects existing files. Are you noticing a speed up or slow down with your scenes? If so, how much? Are you running into issues other than the known issues? Also, any other feedback is also appreciated.

Patch:
https://dl.dropbox.com/u/3431679/bge_pre_z.patch

Builds:
Win32: https://dl.dropbox.com/u/3431679/blender_r49753_pre_z.7z
I’ll work on getting a Linux build up, but I don’t have a machine available to make OSX builds.

Cheers,
Moguri

I will build on Linux 64 and give it a try with some game files I mentioned to you I’m working with. I can tell you shortly how it works.

My voxel demo (running on GLSL mode) taxes the Rasterizer to around 30% on the optimized build, while the normal build taxes the rasterizer to around 20%. I wouldn’t really say that’s indicative of much, since it’s all single blocks that have a single material and texture. It might be better to try a demo blend file. I’ll see if I can find something heavier to test it out on.

I have tested my charlie game in your optimized and in svn build. I don’t see much of a difference! Framerate is about the same. Running at about 150 fps when disabling synched framerate. The rasterizer is a bit higher in your build. ~50 to ~53

@SolarLune and @ndee:
If either of you are willing to share your files with me (I can promise not to share them with people), I would be very interested in profiling them. Out of curiosity, what is both of your baselines? What are you comparing too, 2.63a or trunk?

@SolarLune
I get the feeling your voxel stuff makes use of a lot of objects and is thus making the scenegraph (or looping over buckets, or some other such per object code) rather unhappy. Also, is this multitexture or GLSL materials? I know you like to use multitexture for things.

You have a pm moguri! :slight_smile:

Is this improvement primarily for situations with multiple render passes? (shadows, shaders)

All of my tests were GLSL.

I tried files from two different projects I’ve been working on, ranging from relatively simple to relatively complex in terms of geometry. Neither uses shaders, but both use multiple dynamic lights and shadows. So far my testing showed performance nearly identical between a standard build of (r49720) and patched current (r49753). (both built by me)

I tried martinsh’s recent water shader in the patched build, and the rendering turned crazy (see attached). The water shader works normally in my r49720 build.

FYI, I use the svn libraries, and I build with CMake (because I like cmake-gui, and Scons gives me hassle sometimes)
My noteworthy build options:
BUILD_TYPE = None
CMAKE_CXX_FLAGS = -O2 -mtune=generic -mfpmath=sse -msse3 -DNDEBUG
CMAKE_C_FLAGS = -O2 -mtune=generic -mfpmath=sse -msse3 -DNDEBUG
CMAKE_EXE_LINKER_FLAGS = -s
WITH_SYSTEM_GLEW = True (this is default on Linux, I believe)

Edit: I forgot -

NVIDIA GeForce 9800 GT
NVIDIA proprietary drivers, v295.40

Attachments



I just realized that custom vertex shaders won’t work with this optimization, which is probably why Martinsh’s stuff isn’t working right.

YoFrankie! is actually a nice package for testing purpose.

Well, that would make sense then. Is that a limitation of Pre Z in general? Any way to detect custom shaders and disable it? Maybe add Pre Z optimization to the render panel and have it off by default? If it really can improve speeds like you’re seeing, then it would at least be a nice option to have where it is useful.

I get 8fps increase on a test blend. From 52fps to 60fps.

Awesome work Moguri!
Ex.

I posted the results of both the optimized and a trunk build (r49406). I got 60 FPS in both tests, but with a higher rasterizer usage percentage in your optimized build.

The voxel stuff might be taxing the scenegraph - I was using GLSL mode and materials. You can find the demo in the resources section - it doesn’t use GLSL mode by default, but it only takes selecting the placed cube and giving the material the tileset texture to get it to display correctly.

Wow,i got to try it on my map!

@moerdn:
I’ve found YoFrankie! hasn’t worked so well since 2.5 due to changes and no-one maintaining it to keep up with those changes.

@blendenzo:
I’m thinking I’ll just disable writing into the depth buffer during the pre z pass for objects that are using custom shaders. I’m hoping to avoid having to add a UI option. This way the optimization is automatic for everyone that can take advantage of it, and I don’t have to go playing with DNA/RNA code. I was waiting on feedback from this thread to determine whether or not a UI option was needed.

@Excalaberr:
I’m glad someone got an FPS increase. :slight_smile:

@SolarLune:
I’ve tested the scene without vsync and Use Framerate disabled, and it was running just a smidgen faster in my build than r49753 (couple of fps and maybe 1ms difference on the rasterizer).

@BlendingBGE:
If it’s the map I’m thinking of, I get the feeling you’ll have similar results. :wink:


Configs:

  • ATI Radeon 5850
  • Dysplay List
  • Use Framerate Off

My Scene Test:

  • 1150 Cubes with 1 Material

Builds Tested:

  • Trunk 49756
  • PreZ Build posted here

Results:

  • When Material Transparent (1 in image)

  • Trunk - 210fps

  • PreZ - 245fps

  • When Material Opaque (2 in image)

  • Trunk - 220fps

  • PreZ - 180fps

OBS: PreZ is too 10Fps slower in “Default Cube” scene (Trunk -1161fps / PreZ - 1150fps)

So looks it’s great in scenes with large count of transparent objects. but is slower for all the rest. There is no way to able it just for objects with transparency or it is a system with global effect?

The BGE keeps two lists of “buckets”: alpha and solid. When doing the pre z pass, I do not draw the alpha buckets, since it just causes weird issues. However, it seems that not everything with alpha makes it into the alpha buckets list, as is evident by your tests.

As for the difference for the “Default Cube” scene, the numbers you give show less than 1% drop in performance, which is negligible. However, your other two tests, do make a good case for a UI option.

This is a cool idea. I tried it out (as discussed in IRC) - for my scene, I got a framerate drop of 5-25%. But my materials are quite minimal, so maybe I’m not seeing the full benefit. Maybe I should throw some shadows in there :slight_smile:

We can as I understand have vertex shaders that only change UV-coordinates or normals, and then we might still want a Pre Z pass… guess that is one argument for a manual setting or some real smart automatic setting.

[edited]
Just tried it on a very simple Scene, twice:

I) Performance went from 60-70 fps to 90-100 fps!
II) Performance went from ~95 fps to 140-150 fps!
(I’m not good at even basic Math, but if I’m right, this is in at Least the second Case more than 30%!)

I haven’t yet tested it on anything else, so no Bugs found, sorry.
But it is indeed great to get such a good Performance Increase! :}

EDIT:
Ûh, I must mention: I compared between this Build and the 2.61 Harmony Build. Sorry if that adds to the Difference! D:

hello Moguri. in your GSoC report you said that you’re using AMD PerfStudio. How exactly it works? Looks that in some cases we get a big improve in performance, but can you define already what types of scenes can get more benefits of Pre Z ?