None of it is officially, but yes, on the latest official build its also on GPU. with experimental mode set to on.
Rules AFAIK:
For CPU you should switch OSL on, its faster.
For GPU, the WHOLE scene is sibdivided FIRST and loaded into GPU memory. So either make shure you have a big GPU ram or you set Dice rate higher, also use Both, so bump masks the lower displacement detail. For CPU, there ist no limit as it uses the main ram and swap file, but it IS slow, it still dices the whole scene first!
Why dicing first ? Because Cycles is a pathtracer, and it needs to see the other displacements correctly (already displaced).
I dont think that there is a solution around dicing first, its not a REYES renderer, you need to have the whole displacement in memory for the rays.
If not, someone please enlighten me.
if you export a map out of zbrush use a 32bit map then everything works out of the box (just connect the map witth the displacement slot no more nodes needed) for a grayscale map with 50% gray == 0 you have to substract 0.5 from the map otherwise theres no negative displacement !
Ok, so you mean that it calculates the bounds, but then, either they are full res (which you say they are not, only at rendertime), or coarse bounds, but then what about the shadows, or other rays that have to intersect the highres displacement, AFTER the tile was rendered (and the polygons were dropped out of rendertime memory) ?
That was the main argument for REYES, that it rasterized first and then diced only at rendertime.
But now we shade with rays, how would a displacement be correctly traced if there are no other polygons there, to path trough them after they god dumped from memory (tile was rendered)?
So, either there is a very coarse approximation mesh kept (which will not give pixel-exact raycasting, like shadows, transp, refraction, etc.) and only locally highres displaced, or if highres displaced correctly, then there needs to be a lookup and rebuilt of secondary displacement that is contributing to the tiles, which effectively means: the whole displacement dataset has to be in memory, or at least as a gigantic cache on disk.
I am not clear with this dilemma. Moreover i see that Cycles tells me that it builds the displacement geo, before it begins rendering the tiles. Also the more i lower the dicing rate, the more MEMORY it uses (which should not be the case if ONLY at rendertime). 0,5 is impossible at all on a 6GB GPU, so it NEEDs to generate the full displacement mesh.
Ok, lets say it simpler:
First tile gets rendered, the surface is displaced at rendertime, the texture is luuked up and then the polys are thrown away. Ok i get it.
The second tile is rendred also with subd displacement. Now a ray travels to the surface and reflects in the direction of the FIRST tile, intersecting the previous displacement. Now how in the world can this ray know which micropolygon it hits, unless it is still in memory (which is not anymore > rendertime displ.!). The shadows will not be correct, refraction, and many, many more things.
Testing latest from Buildbot // blender-2.77-ab3c1de-win64.zip - Itâs already in.
But in this build, polys are always split. Didnât happen with builds from Graphicall :spin:
Does this mean that it also generates micropolys from seconday/tertiary,etc⌠bounces ?
the blend is an empty cube⌠a startup scene ? Also one render layer.
UPDATE: Yes, i was right, of course. Yes, it generates polys only when a ray hits and STORES it in memory, but of course, if you have a landscape, the memory will grow beyond recognition at the last tiles (or depending how many ray displacement hits you have).
So, yes, the displacement data GROWS from with each ray, and NEEDs to be stored in memory, which will eventually go extreme if to many polygons are stored.
What i expected, and which is IMHO superior, would be a 2 way out-of-core technique. First, render only the displacement data on CPU (a la REYES, but store it > loads of memory), and then load chunks of the BVH of that data + textures and lights to the CPU or keep GPU. Of course this implies an CPU/GPU out-of-core design, which is beyond cycles right now.
Still i dont complain AT ALL! Amazing!
PS: The REYES micropoly advantage vs MC pathtracing and the challenge of combining them on GPU is well known. Damn, REYES was soooo fast at displacement.
Still the Problem i see is the memory limit that can go huge, especially if you need animation (Dice below 1).
Well there is a way⌠biased rendering. LOL⌠i know.
jeah 1 cube + subdiv (0.1 rate) check the mem for that in the task manager and than render layer 1 and 2 together (scene layer not renderlayer) and check the mem for that
different test scene a closed cube (layer 2)+ a cube with a hole (layer 1)
Yes, i got now what you wanted that i see⌠the memory :-D, its insane 21 GB for a simple cube with no (seeable) displacement, only diced at 0.1.
Why ? The loading of the BVH is taking forever (15min!), the rendering itself only took 15 seconds on CPU.
So it IS diced before, not only when a ray hits at rendertime itself.
Because:
Step 1. It looks at the camera, it dices the geometry based on the pixelspace (21GB fill up! - 5min).
Step 2. It sends the rays (tile rendering, easy game) to the diced geometry (the more far away the less LOD based on the dice), and now the rays can bounce how they want because they intersect the displacement (15 sec).
This means that everytime the displacement is diced, you need a lot of time before rendering, and 2nd. a shitload of memory.
This is not at all like Reyes, because Reyes renders ONLY at rendertile-time, with nearly no memory footpring, and no precalculations whatsoever, this is so because it is a scanline algorithm and doesnât care about MC and any paths at all (only direct screenray/scanline), not to mention unbiased MLT. Uuuups.
So still a long way till the displacement is usable in animation, unless you have a renderfarm. GPU is due to limited memory a no go. I just tried some scenes and i have to say that you get pretty fast pretty short, on memory with the GPU.
CPU and GPU rendering in master, and GPU rendering in the temp-cycles-microdisplacement require all geometry to be generated in advance and stored in memory.
CPU rendering in temp-cycles-microdisplacement has some support for not having all data in memory using a geometry cache. There is a delay when you start rendering however, to compute the bounds for all geometry.
About algorithms:
REYES is not compatible with path tracing, it works to an extent if you restrict the lighting in certain ways, using things like shadow maps and mostly diffuse indirect light. But in the end Pixar also gave up on it too and switched to their new RIS system.
Hyperion style ray reordering can bring back the ability to render scenes much bigger than CPU memory. However neither REYES nor ray reordering work all that well with progressive viewport rendering. The usability improvement from that is very important, and most CPU production renderers now just keep all geometry in memory because artists can work faster with that kind of interactivity.
The CPU geometry cache in the temp-cycles-microdisplacement branch likely works well in some cases but not in others. Weâll have to investigate how well we can get this working in production scenes. Itâs likely that for this to work well in more complicated scenes or with algorithms like adaptive sampling, we would need to implement ray reordering. That would be a big project, unknown when someone would work on that.
Getting the geometry cache to work on the GPU would be another such big project. Newer NVidia GPUs make reading directly from CPU memory easier , and it has been proven that you can get decent performance this way as long as your scene fits in CPU memory, since transferring data between the CPU and GPU can be pretty fast with the right hardware.
It is still possible reduce Cycles geometry memory usage, which would help with or without the geometry cache.
Brecht, thank you A LOT for clarifying these points, so no on the fly rendering.
Only one question, I can remember that in the early pathtracing days early 2000s there were attempts to get micropolys into pathtracers. There were some builds for ex. from Nvidia for Gelato (Larry worked on that heavily) before they abandoned it for Optix.
Now, is there no way to pathtrace first the normal geometry and then to raymarch on-the-fly the displacement in a second pass ?
What i mean with this is that instead of calculating a geometry cache, there would be a LOD ray cache. Because any displacement reorients a face with more subfaces, the second pass (which is subdivision) would access the raycache and localize the result only in more detail for a certain ray radius while displacing. In this way there would be no need for a geometry cache, nor storing any subdivision before rendering although you would need to keep the original nondisplaced geometry for the second ray localisation (ray march).
I understand that this would be ok only for detailing features and rays that are not transported across the whole scene, unlike Hyperion which takes in account nearly all interactions globally.
This could be applied i think also in the viewport, raising only the time of refinement?
In short: To generate micropolys on the fly (from the stored nondisplaced geo) and address them with a kind of microrays (for a lack of a better term), that are based on a ray cache (from the first normal pass) in a second ray pass.
As far as I know Gelato didnât do anything really different in this area, besides the fact that it worked on the GPU instead of the CPU.
There exist algorithms for displacing the mesh entirely on the fly, however performance tends to be very poor in comparison to geometry caches. Thatâs because you are re-evaluating many displacement shaders for each ray, and the ray intersection bounds are poor.
Iâm not sure I understand your idea, it sounds like ray reordering to me? Trace a bunch of rays against the bounds in the first pass, then sort the rays and in a second pass compute the displacement per batch of sorted rays? In such cases a geometry cache wouldnât be strictly necessary, since the cost of displacement could be shared between multiple rays, but it would still help.
Yes, with the difference that the bunch of (micro)rays are on a LOD per/sub-pixel basis localized to the micropolys (based on the dicing rate), not like Hyperion âstreamingâ where huge areas are reordered. Somehow a mini Hyperion for displacement.
Hyperion started exactly like this, they said. Out of a âcrazy ideaâ.