Cycles Development Updates

One of the improvements in speed in Master are given by Russian Roulette termination method. But certain problems have been reported in this regard, and if I remember correctly, Brecht said that if he did not find an alternative solution then it would go back to the old method.

Regarding the above message about Next Gen film, I do not know much about it, but I understood that those are really very good render times for animation in production.

Deformation motion blur is still an issue in Cycles today, and was the primary motivation to use Embree in the Tangent Animation build. Depending on the shot, Embree could be 5x faster while using less memory at the same time.

See https://embree.github.io/papers/2017-HPG-msmblur.pdf for details.

2 Likes

Hey,

yes, in my post about motion blur, I meant mainstream commercial renderers. I saw your Blender Conference presentation so I know that in Cycles, Embree helped a lot in terms of MB performance. The performance of other renderers is partially good also because those other renderers are often already using Embree for a couple of years, but Cycles always strangely avoided adapting it as a whole. At least the public versionā€¦ Thankfully, thatā€™s changing because of your work.

So, now to the point. I assumed Tangent has used their custom Embree build while working on Next Gen. So if that is the case, and Next Gen was rendered with 3D motion blur, which I assume it was, then it means that the Cycles in official Blender builds would render those scenes even slower than that reported average of 3.76 hrs. per frame.

In case Next Gen was no rendered with the Embree motion blur, then I wonder how much would it help in those scenes.

What Cycles does need to improve regarding render time is volumetrics. Hope to see some improvements.

2 Likes

There is GSoC project on that part, hopefully the developper will make it to the end so it can be intergated :slight_smile:

1 Like

Embree has neither CUDA nor OpenCL support, and so far nobody has stepped up to implement that. Thatā€™s one of the reasons why there was reluctance to use Embree directly in the official Blender builds.

So, now to the point. I assumed Tangent has used their custom Embree build while working on Next Gen. So if that is the case, and Next Gen was rendered with 3D motion blur, which I assume it was, then it means that the Cycles in official Blender builds would render those scenes even slower than that reported average of 3.76 hrs. per frame.

Very much so, yes. Several times slower.
Without motion blur, not so much. Anyone doing only stills should not expect too much from Embree.

Iā€™d assume that raytracing kernel would be switched when switching Cycles CPU and GPU modes. I would never imagine Embree being used for GPU as well :slight_smile: That one should use Optix, or something like that.

The main issue I am having is that while GPU Cycles performs ok, CPU cycles speed is just not on par with other CPU renderers. And complex scenes rarely fit into GPU memory, so one is left with no good choice.

Yes and no. Odds are that Embreeā€™s BVH is not necessarily the most optimal one for GPU rendering. Still, for motion blur itā€™s far head of what Cycles currently uses on the GPU and would be an improvement no matter what.

Optixā€™ license is incompatible with the GPL. You could build a standalone Cycles with it, but not include it in Blender. Even with using Optix, RadeonRays and Embree for the respective hardware, one would still need to take much care to make sure all backends return identical results. I had to modify Embree to make its hair intersections match those of Cycles, although I hear they fixed that in Embree 3.x.

Iā€™m not saying it canā€™t be done, Iā€™m saying it would probably be more work than we have contributors at the moment.

Volumes are a big one. There are a lot more volumes in Next Gen than you see in the public trailers.

Cyclesā€™ method of doing volumes means that performance very much depends on shader complexity. Ray marching means that itā€™ll take as many steps through thin parts as it would through thick parts, calling the volume shaders many, many times. Using complex shader nodes (such as procedural noise) in the volume shader can dramatically reduce render speed.

With different methods (such as this, combined with super-voxels or a kd-tree over volume density), the number of shader evaluations can be reduced significantly.

1 Like

Sure, I understand resources are limited. What I meant, primarily, is that Embreeā€™s incompatibility with GPU should not mean no Embree for official Cycles at all. I think it would be fine to implement Embree for CPU only, itā€™s the CPU mode which lacks performance, the GPU could keep using whatever itā€™s using now.

Thatā€™s how the Embree branch currently works, for exactly those reasons. The problem then is that CPU and GPU renders look different, which causes problems for mixed hardware render farms or CUDA+CPU rendering.

Of course, the user then has the option to not use Embree on the CPU. Which then means that there will be even more BVH code to maintain than there already is.

Switching entirely over to Embree would allow to reduce the amount of code (and with it, bugs) instead of duplicating it.

Thanks for the insight all! Very interresting to follow the discussion. I rarely used Motion Blur in other Renderengines so my perspective on this was probably distorted. Volumes with complex shaders are indeed incredibly slow. I like to do clouds with procedural noise and sadly itā€™s unusable for animation on a single PC. Embree would be very much appreciated to speed things up.

I just hope that Embree branch could make it into 2.8 at least. Sure it canā€™t be used along GPU for rendering because of the visual difference, but you usually resort to CPU rendering when you run out of GPU memory, so it would not matter that much anyway :slight_smile:

1 Like

I believe the devs. have been aware of the situation with motion blur and volumetrics. Cycles development in general has prioritized getting it feature-complete over performance (but projects like full Embree integration and the GSoC projects show it has not been forgotten and the known solutions for various bottlenecks are being worked on).

I have rarely used Motion Blur in my work by the way, but I have made quite a fair use of volumetrics. The denoiser can actually help a bit in this area at least.

1 Like

i remember the build with the scrambling distance feature when rendering volumetric was very fast.

https://www.youtube.com/watch?v=I1pC7dNS0hs&t=1s ( only surface on this video btw )

1 Like

Also feature wise, most of things that are out or hard to work through are also because of Cycles not being & is not planned to become BiDirectional engine.

I think Cycles has a nice chance to keep evolving as i.e. is the first and only hybrid ((uni+bi)dir) - i know off - approach taken on Renderman - got my eyes on.

Not all things can fit on GPU, not all things are optimal on CPU, not all things are fast for uniDir PT & not all things are good for BiDir PTā€¦ smart but hard move on Pixar side.

PS
I too wonder, why scrambler is not in Cycles. :wink: please

Itā€™s just a gimmick. Not usable in production for many reasons. The main reason why it runs so fast in that video is because the indirect is faked with AO. If it was true GI, you would immediately see the artifacts that almost never go away and are much more disturbing than the noise.

Now that the CUDA+CPU mode exists, itā€™s more of an issue as you frequently use the CPU mode for extra speed while rendering on the GPU though.

Not to mention itā€™s a nasty possible issue if you have a mixed CPU and GPU render farm. How do you make sure the user is aware that CPU render nodes will produce a different output than the GPU render nodes?

EDIT: I suppose one UI option would be to add a ā€œuse embreeā€ checkbox, which locks out selecting GPU mode when enabled.

So, Iā€™m not disagreeing with the general sentiment that Vanilla Cycles is slower than most other renderers. However, I work in VFX in LA and I can tell you that 4 hour render times are not ridiculous. Also, itā€™s not accurate to compare Cycles to Vray because youā€™re comparing brute force path tracing to the approximate methods in Vray. We use Vray here at work and we NEVER use Approximate methods because they usually cause artifacts. When comparing Apples to Apples (like Vray brute force VS Arnold) it actually comes out more similar. I canā€™t comment on Corona though.

The thing is, for animated feature film work, all of the majors studios use a brute force path tracer. RenderMan, Arnold, Hyperion, Manuka and Glimpse, are all path tracers. They are used because of how unreliable approximate methods can be. Sure they might work for Arch/vis but not for animation.

Another aspect of this is, when rendering for film, itā€™s cheaper to spend money on render time than artist time. This is also one of the main underlying the reasons for doing as much as possible in camera. It takes a lot of artist time to break up a scene into individual passes and exclusions. It may take longer to render the whole scene with moblur and dof but at least youā€™re not paying someone 60$ an hour to spend tons of time breaking everything out.

6 Likes

I donā€™t know how Jeff measured render time, but when measuring from ā€œpress renderā€ to ā€œfinal file on diskā€, then a non-significant amount of time in production is just file transfers. When you have a hundred render nodes simultaneously asking for tens or hundreds of GBs worth of textures, the file server can become quite busy (and slow).

To quote upcoming TOG article about Arnold:

[ā€¦] simply fetching all this data from a local or networked disk, even in compressed form, could easily take an hour, or much more if the file servers or network are saturated with texture requests from the render farm.

I know, four hours may sound long to many. Keep in mind though that those are scenes much larger than what the average Blender user has. One of the first Cycles patches that came out of the production of ā€œNext Genā€ was removing the limit of 1024 textures, because that became a real issue.