Cycles new microdisplacement testing, discussion and blend sharing

enilnacs · April 23, 2016, 3:35am

None of it is officially, but yes, on the latest official build its also on GPU. with experimental mode set to on.
Rules AFAIK:

For CPU you should switch OSL on, its faster.
For GPU, the WHOLE scene is sibdivided FIRST and loaded into GPU memory. So either make shure you have a big GPU ram or you set Dice rate higher, also use Both, so bump masks the lower displacement detail. For CPU, there ist no limit as it uses the main ram and swap file, but it IS slow, it still dices the whole scene first!
Why dicing first ? Because Cycles is a pathtracer, and it needs to see the other displacements correctly (already displaced).

I dont think that there is a solution around dicing first, its not a REYES renderer, you need to have the whole displacement in memory for the rays.
If not, someone please enlighten me.

nudelZ · April 23, 2016, 3:53am

thats wrong enilnacs! are you using the microdisp branch?

the compute subdiv computes the displacement bounds (option to set this manually is missing at the moment) and not the displaced mesh itself
it isnt faster with osl (in fact its slower with image textures…)
for animations you shouldnt use a dicing rate higher than 1 (flicker alert! for a still it is ok to use a dicing rate of 2 or 3 or higher)
for cpu rendering theres indeed no limit cause theres a cache feature in the microdisp branch (not master!)
subdivision/displacement happens if a ray hits the object/s

@@grsaaynoel

if you export a map out of zbrush use a 32bit map then everything works out of the box (just connect the map witth the displacement slot no more nodes needed) for a grayscale map with 50% gray == 0 you have to substract 0.5 from the map otherwise theres no negative displacement !

RickyBlender · April 23, 2016, 4:32am

when is it going to be in master ?

or when will this appear in buildbot ?

thanks
happy bl

enilnacs · April 23, 2016, 5:40am

nudelZ:

thats wrong enilnacs! are you using the microdisp branch?

the compute subdiv computes the displacement bounds (option to set this manually is missing at the moment) and not the displaced mesh itself

it isnt faster with osl (in fact its slower with image textures…)

for animations you shouldnt use a dicing rate higher than 1 (flicker alert! for a still it is ok to use a dicing rate of 2 or 3 or higher)

for cpu rendering theres indeed no limit cause theres a cache feature in the microdisp branch (not master!)

subdivision/displacement happens if a ray hits the object/s

@@grsaaynoel

if you export a map out of zbrush use a 32bit map then everything works out of the box (just connect the map witth the displacement slot no more nodes needed) for a grayscale map with 50% gray == 0 you have to substract 0.5 from the map otherwise theres no negative displacement !

Ok, so you mean that it calculates the bounds, but then, either they are full res (which you say they are not, only at rendertime), or coarse bounds, but then what about the shadows, or other rays that have to intersect the highres displacement, AFTER the tile was rendered (and the polygons were dropped out of rendertime memory) ?

That was the main argument for REYES, that it rasterized first and then diced only at rendertime.
But now we shade with rays, how would a displacement be correctly traced if there are no other polygons there, to path trough them after they god dumped from memory (tile was rendered)?

So, either there is a very coarse approximation mesh kept (which will not give pixel-exact raycasting, like shadows, transp, refraction, etc.) and only locally highres displaced, or if highres displaced correctly, then there needs to be a lookup and rebuilt of secondary displacement that is contributing to the tiles, which effectively means: the whole displacement dataset has to be in memory, or at least as a gigantic cache on disk.

I am not clear with this dilemma. Moreover i see that Cycles tells me that it builds the displacement geo, before it begins rendering the tiles. Also the more i lower the dicing rate, the more MEMORY it uses (which should not be the case if ONLY at rendertime). 0,5 is impossible at all on a 6GB GPU, so it NEEDs to generate the full displacement mesh.

Ok, lets say it simpler:

First tile gets rendered, the surface is displaced at rendertime, the texture is luuked up and then the polys are thrown away. Ok i get it.
The second tile is rendred also with subd displacement. Now a ray travels to the surface and reflects in the direction of the FIRST tile, intersecting the previous displacement. Now how in the world can this ray know which micropolygon it hits, unless it is still in memory (which is not anymore > rendertime displ.!). The shadows will not be correct, refraction, and many, many more things.

This is my problem to understand.

burnin · April 23, 2016, 5:52am

Testing latest from Buildbot // blender-2.77-ab3c1de-win64.zip - It’s already in.
But in this build, polys are always split. Didn’t happen with builds from Graphicall :spin:

enilnacs · April 23, 2016, 6:07am

You need to smooth the object!

nudelZ · April 23, 2016, 6:25am

enilnacs the newly created polys are held in memory (check your task manager - it grows while rendering)

and heres a test scene for you (render layer 1 and layer 1+2, i think it goes over 4gb not sure)

subdiv.blend (108 KB)

enilnacs · April 23, 2016, 6:28am

Does this mean that it also generates micropolys from seconday/tertiary,etc… bounces ?

the blend is an empty cube… a startup scene ? Also one render layer.

UPDATE: Yes, i was right, of course. Yes, it generates polys only when a ray hits and STORES it in memory, but of course, if you have a landscape, the memory will grow beyond recognition at the last tiles (or depending how many ray displacement hits you have).
So, yes, the displacement data GROWS from with each ray, and NEEDs to be stored in memory, which will eventually go extreme if to many polygons are stored.

What i expected, and which is IMHO superior, would be a 2 way out-of-core technique. First, render only the displacement data on CPU (a la REYES, but store it > loads of memory), and then load chunks of the BVH of that data + textures and lights to the CPU or keep GPU. Of course this implies an CPU/GPU out-of-core design, which is beyond cycles right now.

Still i dont complain AT ALL! Amazing!

PS: The REYES micropoly advantage vs MC pathtracing and the challenge of combining them on GPU is well known. Damn, REYES was soooo fast at displacement.
Still the Problem i see is the memory limit that can go huge, especially if you need animation (Dice below 1).
Well there is a way… biased rendering. LOL… i know.

burnin · April 23, 2016, 6:33am

Thank you

nudelZ · April 23, 2016, 6:45am

jeah 1 cube + subdiv (0.1 rate) check the mem for that in the task manager and than render layer 1 and 2 together (scene layer not renderlayer) and check the mem for that

different test scene a closed cube (layer 2)+ a cube with a hole (layer 1)

subdiv1.blend (115 KB)

edit: maybe brecht likes to explain whats happening :)?

enilnacs · April 23, 2016, 7:29am

Yes, i got now what you wanted that i see… the memory :-D, its insane 21 GB for a simple cube with no (seeable) displacement, only diced at 0.1.
Why ? The loading of the BVH is taking forever (15min!), the rendering itself only took 15 seconds on CPU.
So it IS diced before, not only when a ray hits at rendertime itself.

Because:
Step 1. It looks at the camera, it dices the geometry based on the pixelspace (21GB fill up! - 5min).
Step 2. It sends the rays (tile rendering, easy game) to the diced geometry (the more far away the less LOD based on the dice), and now the rays can bounce how they want because they intersect the displacement (15 sec).

This means that everytime the displacement is diced, you need a lot of time before rendering, and 2nd. a shitload of memory.

This is not at all like Reyes, because Reyes renders ONLY at rendertile-time, with nearly no memory footpring, and no precalculations whatsoever, this is so because it is a scanline algorithm and doesn’t care about MC and any paths at all (only direct screenray/scanline), not to mention unbiased MLT. Uuuups.

So still a long way till the displacement is usable in animation, unless you have a renderfarm. GPU is due to limited memory a no go. I just tried some scenes and i have to say that you get pretty fast pretty short, on memory with the GPU.

nudelZ · April 23, 2016, 8:33am

dude you’re wrong! ^^ also you shouldnt test it with master! test it with the microdisp branch

the mesh is subdivided on the fly period. the compute subdiv in the beginning calculates something different not the final mesh

edit2: you know what a rate of 0.1 means? 100 polys per pixel right one hundred
edit: believe it or not

brecht · April 23, 2016, 8:51am

The current state is:

CPU and GPU rendering in master, and GPU rendering in the temp-cycles-microdisplacement require all geometry to be generated in advance and stored in memory.
CPU rendering in temp-cycles-microdisplacement has some support for not having all data in memory using a geometry cache. There is a delay when you start rendering however, to compute the bounds for all geometry.

About algorithms:

REYES is not compatible with path tracing, it works to an extent if you restrict the lighting in certain ways, using things like shadow maps and mostly diffuse indirect light. But in the end Pixar also gave up on it too and switched to their new RIS system.
Hyperion style ray reordering can bring back the ability to render scenes much bigger than CPU memory. However neither REYES nor ray reordering work all that well with progressive viewport rendering. The usability improvement from that is very important, and most CPU production renderers now just keep all geometry in memory because artists can work faster with that kind of interactivity.
The CPU geometry cache in the temp-cycles-microdisplacement branch likely works well in some cases but not in others. We’ll have to investigate how well we can get this working in production scenes. It’s likely that for this to work well in more complicated scenes or with algorithms like adaptive sampling, we would need to implement ray reordering. That would be a big project, unknown when someone would work on that.
Getting the geometry cache to work on the GPU would be another such big project. Newer NVidia GPUs make reading directly from CPU memory easier , and it has been proven that you can get decent performance this way as long as your scene fits in CPU memory, since transferring data between the CPU and GPU can be pretty fast with the right hardware.
It is still possible reduce Cycles geometry memory usage, which would help with or without the geometry cache.

enilnacs · April 23, 2016, 9:32am

brecht:

The current state is:

CPU and GPU rendering in master, and GPU rendering in the temp-cycles-microdisplacement require all geometry to be generated in advance and stored in memory.

CPU rendering in temp-cycles-microdisplacement has some support for not having all data in memory using a geometry cache. There is a delay when you start rendering however, to compute the bounds for all geometry.

About algorithms:

REYES is not compatible with path tracing, it works to an extent if you restrict the lighting in certain ways, using things like shadow maps and mostly diffuse indirect light. But in the end Pixar also gave up on it too and switched to their new RIS system.

Hyperion style ray reordering can bring back the ability to render scenes much bigger than CPU memory. However neither REYES nor ray reordering work all that well with progressive viewport rendering. The usability improvement from that is very important, and most CPU production renderers now just keep all geometry in memory because artists can work faster with that kind of interactivity.

The CPU geometry cache in the temp-cycles-microdisplacement branch likely works well in some cases but not in others. We’ll have to investigate how well we can get this working in production scenes. It’s likely that for this to work well in more complicated scenes or with algorithms like adaptive sampling, we would need to implement ray reordering. That would be a big project, unknown when someone would work on that.

Getting the geometry cache to work on the GPU would be another such big project. Newer NVidia GPUs make reading directly from CPU memory easier , and it has been proven that you can get decent performance this way as long as your scene fits in CPU memory, since transferring data between the CPU and GPU can be pretty fast with the right hardware.

It is still possible reduce Cycles geometry memory usage, which would help with or without the geometry cache.

Brecht, thank you A LOT for clarifying these points, so no on the fly rendering.
Only one question, I can remember that in the early pathtracing days early 2000s there were attempts to get micropolys into pathtracers. There were some builds for ex. from Nvidia for Gelato (Larry worked on that heavily) before they abandoned it for Optix.

Now, is there no way to pathtrace first the normal geometry and then to raymarch on-the-fly the displacement in a second pass ?

What i mean with this is that instead of calculating a geometry cache, there would be a LOD ray cache. Because any displacement reorients a face with more subfaces, the second pass (which is subdivision) would access the raycache and localize the result only in more detail for a certain ray radius while displacing. In this way there would be no need for a geometry cache, nor storing any subdivision before rendering although you would need to keep the original nondisplaced geometry for the second ray localisation (ray march).

I understand that this would be ok only for detailing features and rays that are not transported across the whole scene, unlike Hyperion which takes in account nearly all interactions globally.
This could be applied i think also in the viewport, raising only the time of refinement?

In short: To generate micropolys on the fly (from the stored nondisplaced geo) and address them with a kind of microrays (for a lack of a better term), that are based on a ray cache (from the first normal pass) in a second ray pass.

What do you think ?

brecht · April 23, 2016, 10:20am

As far as I know Gelato didn’t do anything really different in this area, besides the fact that it worked on the GPU instead of the CPU.

There exist algorithms for displacing the mesh entirely on the fly, however performance tends to be very poor in comparison to geometry caches. That’s because you are re-evaluating many displacement shaders for each ray, and the ray intersection bounds are poor.

I’m not sure I understand your idea, it sounds like ray reordering to me? Trace a bunch of rays against the bounds in the first pass, then sort the rays and in a second pass compute the displacement per batch of sorted rays? In such cases a geometry cache wouldn’t be strictly necessary, since the cost of displacement could be shared between multiple rays, but it would still help.

enilnacs · April 23, 2016, 10:35am

Yes, with the difference that the bunch of (micro)rays are on a LOD per/sub-pixel basis localized to the micropolys (based on the dicing rate), not like Hyperion “streaming” where huge areas are reordered. Somehow a mini Hyperion for displacement.
Hyperion started exactly like this, they said. Out of a “crazy idea”.

const · April 23, 2016, 12:42pm

This looks great, check here for daily builds.

enilnacs · April 23, 2016, 12:57pm

I am on OSX, will wait for master buildbot.

nudelZ · April 23, 2016, 1:48pm

dont use master…

Razorblade · April 23, 2016, 2:17pm

its in buildbot allready