DeepDenoiser for Cycles

Thanks a lot!

The OptiX denoiser is indeed only usable with Nvidia cards. The DeepDenoiser uses TensorFlow, which pretty much defines the platforms. There is dedicated support for Nvidia and CPUs. Besides that, there is some AMD support through ROCm, but I didn’t investigate this yet.

Edit: This is a very important question and it has been asked several times.

6 Likes

I see. It’s nice that it can run on CPUs. Everybody will be able to try it even if the opencl/rocm support won’t come to fruition.

I’m sorry that it was a repeated question. Thank you for taking the time to answer it again.
I’ll be following the topic from now on because it looks very interesting.

1 Like

There is no problem for me that people ask questions multiple times. After all, there is nothing like an FAQ (yet). I only wanted to express that a lot of people care about that and I am well aware of the importance.

3 Likes

Iirc, for denoising networks the underlying network topology typically is “just” convolution layers and pooling/upsampling, right?

Those aren’t particularly complex to implement, we could definitely do the training using Tensorflow/whatever and then implement the execution of the network in classic CUDA/OpenCL code. If TF is considerably faster, we could always offer it as an option, but personally I’d prefer an implementation that fits into Cycles’ existing GPU device code instead of adding a third GPGPU stack (next to Cycles and the Compositor).

Edit: Also, this would probably be more efficient as well since we don’t need to copy the pixel data.

3 Likes

Hey Lukas!

You are absolutely right about that. In the case of the DeepDenoiser, there is also a kernel prediction part, but it is also a fairly simple building block.

I am with you on that one, too. This makes a lot of sense. The only reason why I didn’t mention a direct implementation is that I have literally zero experience with CUDA and OpenCL code. If there is no good reason however to add another GPGPU stack, it should be avoided without doubt. Thanks for the feedback!

The only reason I can think of would be that GPUs nowadays are getting fixed-function hardware blocks for neural network applications, so a vendor library like cuDNN or MIOpen might end up faster - we need to try that out.

If I find the time, I might play around with this a bit - how hard would it be to retrain your model to use the existing Denoiser feature passes? Based on that, I might be able to hack it into the existing denoising code as a proof-of-concept implementation.

3 Likes

Exactly. Another performance critical aspect is whether 16 bit floating point numbers are viable. Those seem to give considerable benefits on newer graphics cards.

I would need to to rerender all the training files and train again from scratch. That’s a quite significant effort, but it has to be made anyway at one point. I can shift my priorities to focus on that.
But for a proof-of-concept, I can hack something together that is not sufficiently trained. How exactly would that work? What would the inputs be?

Right now, the inputs are:

  • Name of the pass (gets converted into a vector (embedding) and each component becomes an input plane)
  • Pass (+ internally computed relative variance)
  • Normal (+ internally computed relative variance)

Each pass is denoised separately (diffuse color, diffuse direct, diffuse indirect, …).

The architecture is relatively flexible and could also be trained with e.g. the combined diffuse pass or the combined pass. I haven’t checked your implementation in detail yet. What would make sense for you?

2 Likes

Right now, Cycles generates:

  • Normals + their Variance
  • Albedo + its Variance (basically a combination of all Color passes into one pass)
  • Depth + its Variance
  • Shadowing data (similar to how the shadowcatcher works)
  • Noisy Image + its Variance

Then, a prefiltering stage turns that into filtered Normals, Albedo and Depth (to get rid of noise in feature passes due to e.g. DoF or motion blur), a preprocessed shadowing pass in the 0-1 range and removes fireflies from the noisy image.

You can actually see those passes by enabling experimental mode and enabling them in the Passes menu (with a current master build you don’t need to enable denoising itself to get them iirc).

We could use either of those two sets as the input to the network. Currently I’m also looking into using motion vectors to improve animation denoising, but that’s another topic…

2 Likes

I will set up a training pipeline with the following input:

  • Albedo + Variance
  • Normal + Variance
  • Noisy image + Variance

This will not produce the best possible results, but it might be a very good option for the viewport. With those, it is also not necessary to rerender the existing training data. I just need calculate the variance for the training as it is currently missing in the data. How is the variance currently calculated or where can I find the code?
I don’t think the prefiltered ones should be needed. I am going to make some experiment with those in the future, but I don’t think they should be critical.

Yes, I am aware of that. I tried to get them in 2.79 without denoising, but as you point out, that’s not working. I have checked 2.8 a few days ago and it works perfectly.

Animation is somewhere on my TODO list as well, but way to far away at the moment :slight_smile:

1 Like

The variance is calculated during rendering (using the standard E[X^2]-E[X]^2 trick, unfortunately we can’t use better approaches due to how the split kernel works), I don’t think there’s a good way to fake it for existing renders. If you’re interested in the details, the relevant code is kernel_passes.h for generating the render-time data, buffers.cpp for turning it into the output passes that you end up seeing and filter_prefilter.h for the denoising code that loads it.

1 Like

Reading the new XB1 thread on the Octane forum, there was a blurb from one of their developers which makes me start to think that * perhaps * the better solution after all is improving on the current denoiser.

The new hair BRDF, substrates and a other features require the denoiser to be retrained, which is something that we plan to do in tandem with a major feature improvement we’re working on for the AI denoiser in 2019.

So when a new feature comes in (which is regularly), the whole network might need to be rebuilt. That is a lot of time to support a new feature properly compared to just extending and improving the existing code.

When you have a working pipeline, the amount of work is usually quite small. In most cases, you would likely make some tests to check whether it works with the denoiser and add some renders to the training, validation and test datasets. Then you let it train for some time.
If e.g. new passes are added, there is more work that needs to be done and the training would be longer. But you still have a solid and working basis you can reuse.
From my point of view, it is more likely that less work is required to maintain it, especially if something changes significantly.

1 Like

And how about training for bidir? For example to make your denoiser work with luxcore you’d probably need only the required passes (judging based on optix that works just as good in cycles as it does with luxcore) however once you switch mode from pathtracing to bi-directional pathtracing the noise pattern changes and ai can’t recognize the noise anymore. I guess this would actually need complete retraining ,maybe a complete different mode for the denoiser to switch to bidir training data.?

If the DeepDenoiser was ported to LuxCore, several changes would be needed. But overall they would most likely not be very dramatic. But as Cycles and LuxCore have different passes, it would be necessary to train it from scratch.
It might be possible for a future version to implement it in a way, such that transfer learning and maybe fine tuning could be sufficient. That basically means that the (Cycles) neural network might be reused (for LuxCore) without the need to train it from scratch. But that is currently not a priority. I would be surprised if it didn’t work, but I am also not absolute certain.

4 Likes

Just wanna say that I’m still very excited for this project and I’m coming back everyday to this post to see what’s new. Btw, knowing that it is going to run on cpu too is the best news ever :heart_eyes:

2 Likes

Thanks a lot!

To keep everyone updated: I had very little time lately to spend on this project.
The goal is to get a prototype running within Blender somehow (hacks are allowed for now :slight_smile: ). This means the denoiser has to be usable as directly as possible without ugly Python workarounds. That’s why I need to compile things directly with Tensorflow. As I successfully avoided this sort of tasks in years, there is quite some catch up for me to do.
This has to be used from within Blender (2.8x) which was thankfully really simple to compile! I am doing my best at getting used to the source to more or less pleasantly get an ugly prototype integration.

@lukasstockner97 to give you a short update: I started to update the data generation (rendering) script to 2.8 and to store all denoiser passes as well. To train with the combined pass directly, some changes are needed that I didn’t consider. As mentioned, the main issue right now is a lack of time.

15 Likes

For me blender denoiser is much better i do not understand why (i expected oposite results) i recieve such result but it is what it is

Here 16spp PT


Here 16spp D-NOISE PT

Here 16spp Blender Denoise PT

Here 768spp D-NOISE PT-render time -9:36

Here 768spp NO denoise PT-render time -9:36

Here 100spp Blender Denoise BPT -render time -8:01

D-Noise is separate from DeepDenoiser. You’ll have to direct any questions to @RemiGraphics.

1 Like

D-NOISE is crap in this pictures - loose to much details.