We need a faster BVH build time

I found this thread after searching for the reason for my pain, when unexpectedly I discovered time overhead, when I hit F12 just before actual rendering, lots of it in a somewhat heavy scene.

  • This time delay does not exist in viewport renders
  • This delay is independent of the render engine delaying Cycles and Eevee and Workbench final renders.

Looking around for the cause I found people calling it Build Time, others calling it BVH and I recall there is a thing called Syncing Time, all seeming to be calculations before the start of actual rendering.
Are these all the same? am I in the right thread?

I also do ArchViz and render still images 99% of the time. I would batch them up and run them from the command line, in that case the calculation time, called Syncing, would only happen once and all frames would then be rendered without additional such calculation. What happens with command line rendering, cannot be done inside Blender interactive environment?

The scene I’m working on now has about 800K faces and takes about 3 minutes to prepare (for any of the 3 built in engines Cycles, Eevee, Workbench), larger model than usual, because I am trying “Tissue” addon for the first time, which generates lots of geometry parametrically. It is still easily manageable refreshing in the viewport instantaneously with Workbench (Solid shade), almost instantaneous with Eevee, and a few seconds in Cycles with great Optix denoising.

EDIT: My bad, I did testing with WorkBench, Eeevee and Cycles. Testing was rendering from the cmd line in the background, 10 frames. The Workbench delays in final render were due to the setting “5 passes anti-aliasing” with 1 pass it was much faster. I was also wrong about syncing, it is done for every frame (each of my frames is bound to a different camera). Surprisingly the faster of the 3 engines was Cycles (Optix with RTX 2070, Blender 2.91.0, under Windows 10 X64)

  • Workbench with 1 single pass antialiasing
  • Eevee with 128 samples (maybe more than needed
)
  • Cycles with 150 samples and Optix denoising
    I piped the message outputs tracking the progress to a file for each of the 3 engines:
    1cy.txt (11.6 KB) 1ev.txt (59.0 KB) 1wb.txt (5.8 KB)

@bliblubli Matthieu I did not understand you, does E-Cycles deal with this delay?

If only your camera moves (and to some extent some render settings change), yes, E-Cycles persistent data will make your images to start rendering instantly as shown in this video:

I also wondered where “dynamic” disappeared to. Why spend many-seconds calculating “the world’s most perfect data-structure” when the subsequent render doesn’t take nearly as long? A less-perfect data structure would actually come out ahead. It’s only when both the BVH calculations and the subsequent renders are both going to be time-consuming that there’s any real benefit to investing so much time in “perfect precalculation.”

1 Like

The reason the render is faster is because we computed “the world’s most perfect data-structure” to begin with. Computing the intersection between the BVH nodes and the rays is at the heart of the engine, and is one of the main bottlenecks (along with shader evaluation). There is at least 1 intersection test per pixel per sample, then you have to traverse the tree if the test is an intersection, and the number of tests is proportional to the depth of the tree, and directly correlated with the size of nodes and their topology.

Then, building the BVHs is not the only thing that happens when you see the “building BVHs” message in the UI. To know exactly what is going on and see what the bottleneck is, we need to run the files in a profiler, to see what can be optimized or not. But this depends on the file, every file is different and unique.

There is a concept known as ‘divide and conquer’, which actually ditches the idea of building a full BVH pre-render and just calculates it on the fly. The only issue is that many of the various approaches can’t quite get to where it is competitive in terms of speed.

It may not be as big of a deal if you’re using something as fast and as quick as Optix, but if you’re using the CPU every little boost of speed helps (whether it be adaptive sampling or Embree). I do agree we shouldn’t cripple the BVH code for everyone because of a bottleneck for high-end RTX users.

BVHs for OptiX are built on the GPU by OptiX actually. We have 3 different code paths, one for OptiX, one for Embree, and one for the legacy BVH type which is used for rendering when using CUDA and OpenCL, but can also be used for CPU renders.

So I cannot really optimize the OptiX BVH builds, as it is done opaquely by the OptiX library. What I can do however, is to optimize all the work we have to do before sending the calling the OptiX API for it to perform the builds. And this can be done without crippling the other BVH types.

So I cannot really optimize the OptiX BVH builds, as it is done opaquely by the OptiX library. What I can do however, is to optimize all the work we have to do before sending the calling the OptiX API for it to perform the builds. And this can be done without crippling the other BVH types.

Still
 what i cannot TOTALLY understand is
 why is it such a problem to make the viewport dynamic BVH available to the final render ? I really dont understand this. Why not adding a switch to the performance panel, and let me decide if i like the output or not. Perfect data structure or not. Slower CPU or GPU render or not. I want to DECIDE if its dynamic or not.

This is the biggest question i guess. Because everything works beautifully in the viewport.
Ufff
 The update time is zero.

You can add a big fat warning like: "You idiot, don’t use this because we devs dont believe in a messier BVH update, if you do it the elephant will visit you in your dreams. :smiley: Are you sure you want dynamic BVH on ? :smiley:

1 Like

The sad thing is that is what I’m planning to do. I already made a script to automate the screencapture but right now I’m limited to full HD and I can’t work while rendering. But with an additional computer and a 4k screen I can render while working and get high quality renders and save time. :grinning:

Yeah, but you don’t get the AOVs and ten the screen capture is only 8bit windows frame buffer and so on
 lol. The dynamic switch should have never been put away from the earlier Blender (2.5x or 2.7x ?)

Absolutely. This is just a bad workaround to get animations out in time to clients. I would love to use the ordinary render button instead and be able to use AOV’s , renderlayers and other stuff.

I wonder if there is no way to have a “virtual screen” and capture output data from the graphics to a larger screen than what is physically there?
Or if you setup dual ( or other multiple) monitor physically and capture the whole thing.
Maybe, for example, you could set up two HD monitors in portrait orientation each, side-by-side and get a resolution of 2160X1920?
Certainly if you can capture the gpu addressing non-physical screen space, during scripting only, would be much more practical, if only possible.

If you would post time comparison of the screen capture script, vs command-line rendering with Optix and denoising? would help to understand your gain.

One time when I did a comparison I got around 01:16 minutes in Eevee and 01:45 minutes in Cycles and 00:15 sec in Eevee vewport (screencapture).

Yes, it’s not the rendering itself, the problem we have now is the sync logic also with eevee (and any renderer for that matter. We would need to change the philosophy of how rendering is updated in blender. The actual philosophy comes from a time where CPUs and slow rendering dominated the landscape and processing speed was the issue. At one hour per frame, you would not feel 1min of sync. But now when even a complex image including denoise is under 15 sec you totally feel the time of syncing if you do animations.

2 Likes

Render: faster animation and re-rendering with Persistent Data

For Cycles, when enabling the Persistent Data option, the full render data will be preserved from frame-to-frame in animation renders and between re-renders of the scene. This means that any modifier evaluation, BVH building, OpenGL vertex buffer uploads, etc, can be done only once for unchanged objects. This comes at an increased memory cost.

This just landed in 2.93 alpha today.

8 Likes

Yupeee finally! :heart_eyes:

YEAAAAAAAAAAAAHHHHHHHHHHHH!!! :heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart::heart: :smiley: Now lets test this :smiley:

Oh crap, it crashes. It will take some time till beta/final. Anyway, the sync time is great! Thanks guys!

UPDATE:
I found out the reason why it crashed. Your shaders have to be cleanly linked and i had some octane nodes left in the shading nodes. Everything is butter fast now! 2 SEC! per frame!

UPDATE2:
i have to take it back. Did a simple cube test, still crashing. Ok, so well have to wait for bugfixing/final.
It cant be related to memory, cause i have dual RTX 8GB cards, and its a simple cube.

I love to test this with an old scene when I have time. It sounds like it can help a lot with animations. :+1:

Im testing it with a job im doing, the scenes are simple 3d illustration style, and the objects doesn’t show up with persistent data on