real time bullshit
Another invaluable post from Panos Zompolas:
A few thoughts on the recent realtime ray tracing demos
<b>22 March 2018 05:21 PM</b>
In the last few days, we received a number of questions regarding the recent ray tracing demos that were shown around GDC. We are referring to Microsoft’s DXR and NVidia’s RTX.
As some of you might already know, the Redshift founders are all ex-videogame developers: Rob and I were rendering leads and Nic was doing tools. So it goes without saying that seeing this tech arrive to videogames is great news! We think better reflections, ray traced AO, better area lights, etc will really help the look and realism!
One question we’ve been getting asked a lot is “will this tech make it into Redshift”? The answer to any such tech questions is: of course we’ll consider it! And, in fact, we often do before we even get asked the question!
However, we think it’s important that everyone fully understands what they’ve been seeing in these demos. And what are the misconceptions and “technical gotchas” that might not be clearly mentioned in the context of professional/production rendering.
So, the first misconception we see people fall victim to is that these demos are fully ray traced. I’m afraid this is not the case! These demos use ray tracing for specific effects. Namely reflections, area lighting and, in some cases, AO. To our knowledge, none of these demos ray traces (“path traces”) GI or does any elaborate multi-bounce tracing. That is to say: it’s using rasterization, like modern videogames do. In plain English, this means that a fairly good chunk of what you see on screen is done with tech that exists today - if you’re a user of Unity or Unreal or any other DirectX/OpenGL render engine. What you don’t have today are the fully ray traced reflections (instead, you get screen-space or cubemap solutions), the more realistic ray traced area shadows (instead, you get shadow map area shadows) and ray traced AO (instead, you get SSAO).
The second misconception has to do with what hardware these demos are running on. Yes, this is Volta (a commercial GPU), but in quite a few of these cases it’s either with multiple of them or with extreme hardware solutions like the DGX-1 (which costs $150000). Of course despite all that, seeing the tech arrive is still exciting because, as we all know, hardware evolves and the performance you get today from a $3000 or $150000 solution, you’ll get in a few years time from a much cheaper solution. So while this is “bad” today, it does show what will be possible in the not-so-far future.
The third misconception is that, if this technology was to be used in a production renderer like Redshift, you’d get the same realtime performance and features. Or that, “ok it might not be realtime, but surely it will be faster than today”. Well… this one has a slightly longer answer…
The main reason why a production renderer (GPU or not) cannot produce “true” realtime results at 30 or 60fps isn’t because you don’t have multiple Voltas. The reason is its complicated rendering code - which exists because of user expectations. It simply does too much work. To explain: when a videogame wants to ray trace, it has a relatively simple shader to generate the rays (reflection, AO, shadow) and relatively simple shaders to execute when these rays hit something in the scene. And such a renderer typically shoots very few rays (a handful) and then denoises. On the other hand, when a renderer like Redshift does these very same operations, it has to consider many things that are not (today) necessary for a videogame engine. Examples include: importance-sampling, multiple (expensive) BRDFs, nested dielectrics, prep work for volume ray marching, ray marching, motion blur, elaborate data structures for storing vertex formats and user data, mesh-light links, AOV housekeeping, deep rendering, mattes, trace sets, point based techniques.
And last but certainly not least… the shaders themselves! Curvature, for example, uses ray tracing on each and every intersection. Same with round corners. And then there’s the concepts of layering multiple materials (each one shooting its own rays) and procedural bump maps which means lots more behind-the-scenes shading work you guys don’t even see. And let’s not forget the concept of out-of-core rendering! The list goes on and on and I’m pretty sure I’ve neglected topics here! A videogame doesn’t need most of those things today. Will it need them a few years from now? Sure, maybe. Could it implement all (or most) of that today? Yeah it could. But, if it did, it’d be called Redshift!
We’re fully aware that the above might sound like we are trying to belittle these demos. I want to stress that this is not the case at all!. We are genuinely excited about it and have no doubt that it will keep evolving as GPUs get faster and faster. And, if it evolves in a way that we can ‘serve’ it to Redshift users without having to sacrifice 70% of Redshift’s features then we will absolutely use it!
Closing, I just wanted to re-iterate that we’re always closely watching all rendering-related tech and always ask the question “can our users benefit from this?”. It is part of our job to do so! In this case, this question doesn’t have a super-easy answer but you can bet we’re thinking about it! If any decisions are made, you’ll hear about them here in these forums.
If you have any thoughts that you’d like to share below… please feel free!
yeah everybody thinking that this will bring full raytrace and that also on a single consumer GPU should realize the reality
but still exiting looks into the tech future
bit more from presentation (just a couple of minutes before the cinematic)
anyone knows a link to full presentation?
GDC 2018 Presentation Slides
SHINY PIXELS AND BEYOND: REAL-TIME RAYTRACING AT SEED
The info here is a little off because the NVidia DGX is NOT a $150,000 GPU. There’s no such thing. The NVidia DGX is actually a miniaturized server that can be used for supercomputing or rendering. If you look at the site, the cost is $50,000. 1/3 of Panos mentioned. But the GPU inside of it, if you look at the tech expo from December, is a Titan V, which is $3,000; 1/50 of the price Panos mentioned. While not everything is completely raytraced, this demo is to showcase how far they’ve come. Real-Time Raytracing will be taking advantage of Tensor, which can only be found in Titan V and some specialized high end Tesla GPU, the most expensive one being $46,000. However, the price for the Tesla has nothing to do with the DGX because the DGX is, once again, powered by Titan V. The idea isn’t complete raytracing in real-time. The idea is to begin working toward full real-time raytracing.
OMG! Thank you for posting this. The whole I’m reading this thread I’m thinking, Isn’t this the same as Eevee? I was just about to post this same thing. Doesn’t Eevee use ray-tracing right now? I always thought screen space effects where ray traced shader effects. Is that not the case?
Yes, Eevee uses Screen Space Raytracing (as basically every Game Engine right now). No, it is not the same as Eevee, as it is not only Screen Space raytracing and therefore way superior. Screen Space doesn’t have all the information you need and you will always get artifacts from it. But it is quite cheap. The raytracing Api is more something like OpenGL for raytracing. It allows you to write GPU-raytracing systems using shaders and leaves all the ugly implementation details like acceleration structures to Microsoft/the driver developers.
So what about the future of Evvee will it get support for RTX or go the way of a Vulcan Open GL ES implementation of Realtime Raytracing like PowerVR did in 2015?
1). Does this work in conjunction with screen space raytracing effects ( like SSR ). As stuff like SSR are post processing effects and therefore will happen after it, is it possible for them to work together?
2). Does screen space ray tracing run faster then “traditional” ray tracing ( assuming they have the same amount of bounces and hits )?
Once the actual ray tracing techniques become usable in practise, it has to be found out which kind of combination makes sense or maybe is needed to get acceptable performance. No matter how cool those technical demos are and how much hype they produce, it is going to take years before comparable solutions are e.g. used as the default in games. While the development moves towards that, several new techniques and varients are going to appear, that’s why no one can tell you what kind of combination could make sense.
Screen space effects work only based on the information that is actually present on the screen. This makes them a lot faster and in certain cases very flawed. In screen space effects, bounces don’t make sense, because you would most likely reach places for which you don’t have information anyways.
There was a interesting article on GPUopen website done by Enscape here: https://gpuopen.com/deferred-path-tracing-enscape/
Now Enscape is a realtime deferred path tracer, but there are some interesting things they did,
Quote’ We use Radeon Rays (formerly AMD FireRays) for the BVH construction and traversal. We vary between different tracing kernels across different hardware setups to achieve the best possible performance. We ported the stackless traversal algorithm to run on OpenGL® 4.2 hardware, so that the kernel runs in a plain fragment or vertex shader without the need for Compute Shader or OpenCL™. ’
We completely avoid casting primary rays by using our G Buffer as a starting point.
We then accumulate ray samples across multiple frames to solve each fragment’s BRDF. A mapping function defines a distinctive ray direction for a group of four fragments (half resolution). We use a global low-discrepancy seed per frame and a local random value which comes from a noise texture. Using any plain pseudo-random sampling must be avoided, since it will lead to visible artifacts.
First, we try to cast the diffuse rays in screen space. If we’re able to detect a hit in the last frame’s irradiance buffer, we even get a local multi bounce reflection for free. If a screen space intersection wasn’t found, we path trace the ray in our BVH (Fig 1). This optimization alone saves 30% of the first bounce of secondary rays, depending on the scene.
For specular, we basically do the same and vary the number of local samples based on the materials roughness and metallic-value.
Ray bundling In order to get coherent data access, we bundle our rays into separate workgroups (or in terms of the OpenGL 4.2 implementation: different draw calls).
We bundle 12 world space direction segments into separate buckets, based on their generated ray direction. The usage of a tiled noise lookup texture during ray creation ensures that those buckets are roughly equally sized.
Tracing the directions separately both in screen space and for path tracing improves cache coherency.
BVH streaming Building the BVH for a complete architectural scene will fail very quickly under real-time constraints and hardware limitations. Therefore, we only store a fraction of the scene at a time. We determine what objects to include into the BVH based on their estimated visual importance weighted against their BVH cost (Fig 2).
float objectScore = lightingRelevance * visibleVolume / polycount;
while(sumOfObjects > BVH_COMPLEXITY_THRESHOLD)
This BVH update is done on the CPU and continuously uploaded to the GPU. The update is divided into smaller chunks to avoid lags during memory transfer.
Mesh Preprocessing Mesh preprocessing is usually necessary because high polygon objects occur pretty frequently and can slow down the BVH traversal. It’s important to only “shrink” objects during simplification to avoid self-occlusion.
Special objects like leaves are converted into procedurals, which are compactly stored in the BVH. In terms of vegetation, the self-occlusion is not too noticeable, but the overall shape and density has to be maintained to look plausible in reflections.
Direct light For every ray intersection in our BVH, we calculate incoming sun light using a shadow map lookup. For artificial lights or emissive surfaces (which can be thousands per scene), lighting calculation during traversal is not feasible. Therefore, we bake the direct light (except the sunlight) into the BVH on a per vertex basis. We re-tessellate the geometry anyway, so it’s easy to enlarge the tessellation density at points where we expect direct light detail.
This gives us the advantage of reduced memory fetches for direct lights and also allows to change the sunlight with no special precomputation or update time, other than the usual sun shadow maps.
Specular For the material’s reflective component, we sample at half resolution and use previous sampling results to combine them to a high-resolution image. The filtering is done BRDF aware, to keep the smearing and blurring artifacts at a minimum. For high quality outputs, we even create a refinement queue based on unexpected variance in a 3×3 pixel quad to get full, high resolution image quality.
Alpha Reflections We support order independent transparency and want to use the path traced reflections on those surfaces as well. The challenge is the unpredictable layer depth and the required performance budget. We therefore render every layer in a deferred shading style and run our specular tracing in upsampled half resolution. We do not store a separate history buffer for each layer, so we have to accept a little blur to hide the missing history which would be required for a proper temporal upsampling to reach a higher quality.
Filtering We use several temporal accumulation buffers to keep the neighborhood clamp window as small as possible. The new results are first combined with the accumulation buffer (Fig 3) before filtering to avoid smearing. Before filtering, we compute the expected radius in a local neighborhood to keep the amount of texture reads at a minimum.
Performance The critical point, besides overall rendering performance, is a content agnostic ray traversal cost. The screen space ray traversal performance is not dependent on the scene complexity and can be scaled in terms of sample count and march length. In most cases in architecture, the number of polygons in the BVH correlate with the traversal cost. Keeping the BVH complexity constant is therefore key. We measure the tracing and overall rendering performance at run time and adjust the allowed BVH complexity. We even adjust the image resolution to keep a steady frame rate. Multi bounce diffuse lighting however is currently only enabled on our Ultra profile.
There is no reason why Eevee cant be used in a similar way.
- You can do your raytracing in almost any step of your pipeline as it is not bound to the geometry processing and therefore more like a “post processing effect”. And you can combine them in a meaningful way. You could do screen space raytracing for speed and switch to the real raytracing for all the cases, where screen space raytracing fails. This approach will have some artifacts but I’m sure you can make them less vissible by some more or less intelligent mixing.
- It runs faster, but it only works in a meaningfull way for doing 1 bounce that points away from the camera. They are great for cheap reflections, but they are often wrong, have a lot of difficult edgecases and don’t work at all for diffuse bounces. But a nice thing for all screen space techniques and one of the reason, why they are used is that their computational cost is almost completely independent of the geometric complexity of your scene.
DirectX Raytracing Fallback Layer is now available on github