Cycles Development Updates

Pretty much any modern renderer is a Path Tracer. I’ve tested V-Ray, Corona and Arnold. V-Ray and Arnold with their native OSL implementation, while Corona with 3ds Max 2019 OSL. The results were consistent. Obviously, no AO precomputation. That’s like from the old RenderMan/Mental Ray years. In fact, even Mental Ray, the ancient renderer itself, deprecated precomputed AO methods many years ago.

Lastly, in terms of relative native AO vs OSL AO performance, it’s irrelevant if the renderer runs on GPU or CPU.

You are again digging into a nonsense. There is no “optimization” for the native AO map. It’s just AO map, its principle is very simple. The performance difference stems from shooting rays through a low level raytracing kernel vs shooting them through high level scripting language. That’s what makes the difference.

The bottom line is that I’ve delivered almost all my commercial as well as personal projects past 5 years with such shading methodology, and performance was never ever an issue. In fact, the performance hit was negligible. All I want is a native AO map which pretty much every production renderer these days has.

AO shaders are important to have and can be fast enough for practical use, there’s no doubt about that.

But if rendering AO through OSL is 25x slower than natively, that means something is very wrong in either the OSL trace implementation or the OSL shader. From my experience integrating OSL in Arnold and Cycles, it was nowhere near that number, more like 1.2x slower.

1 Like

This is the one I tried:

http://pressf9.free.fr/compounds/GAO2.1.osl

Also, I’d assume that the native AO map would work on GPU, which would be a huge benefit. Currently, OSL works only on the CPU, and CPU Cycles is generally pretty slow (compared to other Embree based CPU path tracers on the same CPU).

How exactly have you used these maps? How many samples did you need? To get a really sharp division for something like wear-and-tear, you would need several samples per shading invocation.

The complaint that I’ve seen from artists is that the kind of AO that just averages a single sample over multiple passes (like in Octane) doesn’t work for this, but I’m not sure which amount of samples would be satisfying.

It’s relevant for the architecture of the renderer whether it can (or has to) work on the GPU. CPU-only may allow for ray batching and reordering, making ray intersections cheaper. Nested ray evaluation is going to hit the GPU harder than the CPU.

There’s always optimization opportunities when dealing with internal code. You can get a single AO sample for free from the environment light test, for example. The ray intersection from OSL is not done in OSL, it’s done by the renderer. There’s going to be some calling overhead in that, but there’s no reason it would need to be magnitudes slower.

OSL is not a “high level scripting” language, it’s a shader language. It is compiled to native instructions using LLVM, which is also used for CUDA, C++ and OpenCL compilers.

“Fast enough” and “very slow” can be the same thing. I’m not arguing against integration, I want users to adjust their expectations. It may well be the case that after you have destroyed your render times with complex materials, the extra ray tests don’t make a big difference anymore. Maybe you’re rendering scenes with relatively low geometric complexity, where the cost of a single ray traversal is also low.

I’m not an OSL expert, but right off the bat:

  1. It is only filtering against diffuse rays. You’ll want to filter against any sufficiently “diffuse” ray, pretty much anything that isn’t a primary or sharp glossy ray.
  2. It’s using two “perlin noise” calls for random numbers, which is pointless and expensive. The computation of the direction doesn’t look good either.
    For reference (“Buf A”, GLSL, but should be portable):
    https://www.shadertoy.com/view/MsdGzl

Not to put down the work, but this was written by an artist and it shouldn’t be used to compare performance baselines.

Yes, obviously, you can’t achieve such effect by accumulating single rays. I don’t understand why this needs to be debated again when Lukas has already pointed it out above in this thread… ?

How exactly have I used those maps? I used them as a Mix/Lerp factor for all sorts of maps:


To create things like these:
https://www.artstation.com/artwork/x2EnX
https://www.artstation.com/artwork/RLl2r
https://www.artstation.com/artwork/yE1DJ
https://www.artstation.com/artwork/4b3N2
https://www.artstation.com/artwork/QxBz3
https://www.artstation.com/artwork/JVZra

Everywhere in these scenes, materials utilize AO maps in this manner, yet average rendertime for these scenes is about 15 minutes/frame on i7 5930k. AO is just a bunch of very simple rays, which don’t trace anything more than occlusion. If you have scene full of ray branching effects such as glossy reflections, refraction, brute force GI and image based lighting, if you introduce AO based materials, you won’t notice almost any performance impact.

You will usually notice some impact only in the very trivial scenes, where AO maps end up being one of the most expensive effect to render, but such scenes are becoming increasingly rare. And even then, the rendertime lost is easily compensated by workflow time saved by not having to UV your models, at all.

As for the CPU vs GPU. Cycles can do ray branching, so I don’t see why this should not be possible.

Regarding the specific OSL shader, indeed, you may be right that it’s poorly written. I don’t know how to write OSL shaders. That’s one more reason I want to see the native one. I want to exclude any possibility of using one written by someone who doesn’t know what he’s doing. Most of all, I want to be able to run it on the GPU, because as I already said, Cycles performance on CPU leaves much to be desired.

It’s not being debated, I’m asking you how many samples you need, because I’m not the artist with all the experience.

I’m not saying it’s impossible, I’m pointing out some of the issues. Having said that, those issues are pretty much the same as the issues with the Bevel Shader, which will be in 2.8, so I don’t think there’s an argument against an AO color node in 2.8.

That’s a fair point, but again you are talking about your experience with different renderers. You can get a vast speedup from Cycles on the GPU, or you can destroy it by using complex materials, lots of lights, etc.

There are several scenes in the BI benchmarks where the speedup from a highend GPU is less than 2x over a good CPU, which is discounting that Cycles could be faster still if it was written exclusively for the CPU.

So you are pointing renderers are different and at the same time, asking me for specific amount of samples. That’s where I am confused. The amount of samples necessary is relative and varies a lot, depending on how given renderer handles shading rate, if it has some things like min shading rate per eye ray, if it does ray branching or not, etc…

How can I give you specific number of samples when Cycles doesn’t even have AO node I can try it on, and when experience with other renderers doesn’t apply much in this case, since numbers vary quite a lot? I mean for example Corona’s default of 16 AO rays per pass was reduced to 12 with introduction of better random sample generator. V-Ray uses default of 3, since it’s treated as subdivisions. Mental Ray works best with around 64, since it’s not progressive and this is the sweet spot for its unified sampler.

You want me to provide a number which would not make much sense given the context of things…

Arnold has a specific setting for the amount of (squared) samples in an AO node and it’s the renderer that is most comparable to Cycles. That’s what I would be interested in. Corona has the same (non-squared) setting. Those are the number of rays cast, so it is somewhat comparable. A difference between requiring 4 samples and 16 samples should be significant.

Exactly. Now, for example, final renders in Corona for me are somewhere between 64 and 256 passes, because Corona by default does 16 shading rays per eye ray, while Cycles users usually render between 500-3000 passes. So if I gave you some random number, let’s say 16, which is the most common number when it comes to amount of AO samples, then it would really not mean anything. What’s important is the context, in this case renderer. Doing 3000 passes * 16 AO samples would be 48 000 AO rays per pixel. Likely way too much.

Only case when I could give you a reasonable number would be if there was already AO map in Cycles, and I would be able to test it on a several difference scenes and see which number roughly amounts to best speed/quality ratio.

To make myself clear, I’m not trying to build an argument, I’m personally interested what you, as an artist, consider a “good enough” amount of AO samples to drive one of these AO-based dirtmaps.

That’s not how it works though. If you want to use the AO color in a shader, you have 16 AO samples per shader invocation. Shaders get invoked for every path vertex*, so this depends on your path length (bounce settings). You can cut down on that by ignoring this for non-primary/non-glossy rays.

*assuming it hit a material using AO color

Of course it was a simplification. I assumed you’d understand that. :slight_smile: My point is just that given how different renderers handle shading rate, it’s difficult to answer “How many rays do you need?” kind of question. Obviously, I don’t base these decisions based on some math, as such math would be impossible. Instead, the workflow is visual. The first decision depends on if the renderer is progressive or not (most of the modern ones are).

If the renderer is not progressive, then I simply do a test renders, and use AO samples value that’s right on the borderline of acceptable, when it comes to noise level of the resulting procedural AO mask.

If the renderer is progressive, then I look for AO samples value which makes noise level of the procedural AO mask similar to the noise level of the other effects, such as GI noise, area light noise, glossy reflection noise and so on. I am just looking to make sure I don’t use excessive number of AO rays, or too low one.

To avoid ten more pages of speculation about performance, I just went ahead and created a quick proof-of-concept AO node.

Source: https://github.com/lukasstockner/blender/tree/aonode
Linux build: http://graphicall.org/1233
Windows build: http://graphicall.org/1234

21 Likes

Works amazingly Lukas!

It’s also very fast 4 seconds without AO, 4.29 with AO.

This opens up a lot of opportunities!

Thank You so much!

3 Likes

Thank you so much!!!

I’ve made a quick test using AO map to mix painted metal plate material with rust in corners:


The performance is amazing, especially on the GPU.

I do know that this is just a proof of concept. The biggest limitation right now is that it only traces AO rays against the same object, not the rest of the scene, which is very important for practical use.

The thing is, if this proof of concept version got updated to trace against rest of the scene, and had one more boolean switch to choose between current mode and tracing AO rays inside of the objects (for convex edges), that would be pretty much final, production ready version for me! :slight_smile:

Thanks again!

EDIT: Tracing rays against same object is also very useful. Other renderers have a switch in AO node, which you can enable to trace only against the object AO evaluation was invoked from (like the current state). But I didn’t want to scare you away with requesting too many options :slight_smile:

7 Likes

It’s great to see this. XD

Hi @lukasstockner97,

The AO node looks great. Will this be added to the nightly Master build in the near future?

A question out of curiosity: what’s the difference between the AO node and using Geometry ➔ Pointiness?

Thanks,

Metin

Pointiness works only when you have sufficiently dense geometry. It won’t work for example on a simple 6 polygon cube. Or let’s say you model a simple modern house, with simple, non-subdivision modeling. Such model won’t have nearly enough topology for pointiness to provide any meaningful result. While AO map will allow you to procedurally shade anything, regardless of the topology :slight_smile:

Thanks @rawalanche,

I always thought Pointiness is geometry-independent, and only the old Dirty Vertex Colors approach is geometry-dependent.

Yes, actually pointiness is quite similar to to the Vertex Color approach. It calculates the attribute based on angle difference of surrounding topology. So if you have a simple 6 sided cube where all the surrounding topology always ends up 90°, then you’ll get just a constant color :slight_smile:

1 Like

You also can not distort the gradient the pointiness attribute delivers. At least not in an easy way.
With the AO approach you can simply plug a noise texture into your shader tree (see rawalanches video) and get very good results.

1 Like