cycles - tile disabling

here is a build that disables tiles that show little progress, working on the noisy tiles only.
When the noisy tiles become settled, it continues to sample all tiles depending on the noise level.

This is for cycles rendering, “progressive” only. The settings for “NOPping” tiles are found under “Performance”.

nop min samples: run at least some samples on the entire image before attempting to nop anything
nop nth sample: check every x-samples whether we should NOP some tiles and work on “bad tiles” only for a while
nop threshold: should be less than 0.01 which is still noisy in dark regions, I suggest 0.00
nop max nop: maximum amount of tiles NOPped (0.9 = 90% tiles can be nopped)

Suggested values are:
min samples: 10
nth sample: 20
threshold: 0.002
max nop: 0.9

You need all supportfiles from here:
and the latest EXE is here:

The patch is here:
Be aware that there is some other stuff for cycles in this patch like enhancing the XML.
To build for blender, define BLENDER_APP , otherwise you get crashes.

Here is a package with .blend that shows how to save rendering time:
Normal sampling: 2048 samples : 11:17 min (reference image)
NOP settings: min10 nth20 threshold0.01 max0.9 : 5:35 min (with visual diff to reference image)
NOP settings: min10 nth20 threshold0.002 max0.9 : 8:16 min (w/o visual diff to reference image)

Hi again, now that we can discuss on BA and not clog up the thread on the developer site.

I understand that you’re working hard to improve it, but since this is more open to discussion now, let me talk about the (literal) corner case that I’ve had a bit of trouble with before.

The following is a rough illustration of that case.

As you can see it’s not a rendered example, but you get the idea. In that the smaller these corners are, the less likely a sample will always make a contribution among the N-samples allowed and the more likely the tile will be nopped in this state for the long term.

This means that surrounding tiles will visibly look a bit more converged along with the easier to sample part of the tile in question, and you will get this triangle of noise in the middle of the scene that’s hard to remove.

I’m really not sure about trying this patch again until we can be sure that such a case can be prevented (some ideas of mine already written in the developer thread).

I understand your point, but “likeliness” will always be our foe. So even with the theoretical approach of variance, this kind of situation will be a problem.

Here is my image response to “performance bleeding into adjacent tiles”:

So even if we get the noise “smoothed out” from tile to tile, there will be a larger noise gradient along the boxes that touch the green arrow.

That is why I so storngly suggest that low threshold of .002 . Even with gamma-correction this yields barely visible noise.
Once the entire image is below that threshold, it will be continued.
For half-float renderings this value should probably be further lowered - making sure only lownoise tiles are nopped.

On the other hand, this means that some scenes will never be nopped. They are just too noisy/complicated.
Maybe people are disappointed - but there is no “shortcut” to nice images (besides biased rendering) or a magical speedup.

But the issue is that sometimes, it can be difficult to get any sort of contribution, big or small. I’m not sure if you get the bleeding idea here as of now since it seems like you’re thinking of the tile itself being the ‘pixels’ on the evaluation map (so a 40x40 tile is actually a giant 40x40 pixel).

What I’m saying is that you still nop on the tile level, but the error map is created at the pixel level* (even over the areas that are currently nopped), so you dilate the edges of that pixel-level map and smooth a bit and the criteria for nopping becomes based on the maximum value that you can find for every pixel within the tile (so a tile that looks like my example will not get nopped while the immediate tiles that neighbor it from the red area do).

As another example, you have a 40 by 40 pixel tile, that means you have 1600 pixels within the tile, the error is then found by quickly going through those pixels and finding the largest error value that’s available.

    • When I mean an error map at the pixel level, I mean something like this (taken from an older thread on noise measurement).

      The main difference is that the final error map would also have the equivalent of two more nodes, one that dilates the edges a little and one that smooths everything a little (with nothing that scales down the resolution or pixelizes the map). This map would be calculated completely independent of the tiles, but the nopping system would refer to the map when it determines the tiles that get deactivated.

My concern is about performance. Here are the two extrema:

A- 2048 samples over the entire image

  • pro: static tiling, arbitrary breakdown into separate independent tasks
  • con: waste of samples on “easy” pixels

B- pixelwise “noise level” detection and sampling count

  • pro: “nice image” (to be proven)
  • con: pixel-wise tasks, not suitable for massive parallelism

While I think A’s cons are obvious, let me expand on B.
When thinking about GPU and 1000 threads running in parallel, it doesnt make sense to stop 800 threads and let just 200 “bad pixel” threads run. The 800 threads might just continue to run and sample the easy pixels.
So trading wasted easy-pixel-samples against wasted GPU-waiting-cycles - I wouldnt recommend it.

Talking about CPU rendering - totally different story. No massive parallism, purely individual pixel-treatment.
But IMO I would not concentrate too much on it. GPUs have become too cheap to ignore them.

Given enough time, I totally agree with your approach:
1- (full) pixel-wise noise evaluation (observed difference / variance)
2- (pixel-wise) smoothing
3- pixel-wise sampling

1 and 2 will eat time, going through all pixels. And then there is 3 that requires overhead for GPU devices.

Let alone the synchronized waiting for all tiles to stop, which we are already introducing by adaptivity.

In the above blend I get only about 25% speedup using that very simplistic tile nopping. TBH I expected more considering the upper third of the image is just plain black.
Plus the “reference image” was calculated with my stuff disabled, but still in the code. So I assume a totally clean build will run faster, making the speedup even lower.

Problems I see with GPUs:

  • full pixelwise error estimation takes time (maybe run on CPU in parallel? consider rendering on CPU instead!)
  • smoothing step (memory access penalties by accessing adjacent-pixel memory, -> tile boundaries?)
  • pixel-wise sampling via x/y-redirection: memory access race conditions
  • sample-count balancing when redirecting (one pixel not evaluated -> other pixel twice evaluated, to keep overall count equal)
  • copy GPU memory to CPU for full-image evaluation (vs just RGBA image transfer)

So what drives me? Well, it is interactive rendering. We needed something to quickly make nice images while the user tweaks the camera. So after a few samples the environment looks “ok” but the hard pixels looked very bad. So we redirected the samples to the hard parts - good for the eye but bad for overall performance (interrupt rendering, copy memory from GPU, evaluate error, copy memory to GPU). Turns out we have a slight speedup remaining by nopping (using short-cuts: partial tile evaluation, tile-wise nopping, tile-wise sampling).
I fear that if we put more intelligence into the adaptivity, it comes with little gain but lot of penalty (‘full evaluation’ alone).

So my hope is on Lukas. If he makes Metropolis and adaptive samping “tile-wise”, then this would nicely fit together.
And it needs to be proven that all that yields a speed-up keeping the same image quality.
And then comes cinema with 10k resolution - cannot have the entire map on one GPU.
And then comes splitting up the image onto several GPUs.
Essentially generating new “tile borders”.
…making sure our current efforts are usable in the future.

I implemented that theoretical approach, pixel-wise. It works well on your test scene, but it is not suitable at all for BPT. This is because we need a certain data basis in order to apply theory at all.
My old approach (“observed progress”) works with little samples. For theory (“theoretical error bound via variance”) to work properly you need about 100 samples. That is feasable with PT only.
Here is the implementation v03 rev 1:

Suggested settings for above blend:
best CUDA performance: tile size 256
(dont know about CPU, but put the best size here when comparing to 2.71RC2)
NOP min samples: 100 (200…500)
NOP nth sample: 50 (100)
NOP threshold: 0.01 (doesnt matter, will adjust by itself)
NOP maxnop: .9 (hould be > 50%, or he will try to activate the black background)

I understand what you mean by blurring and so on. Yes, that would be needed in order to make up for missing samples. Once you have around 100 samples (better: variance normally converges around 500), theory actually works pretty well.

Well, I’ve read before that adaptive sampling techniques in general don’t work well with bidirectional path tracing, even Dade of Luxrender wasn’t able to figure this out completely and recommends undirectional tracing for adaptive sampling in that engine. It’s not just my theory in this case.

Also, the idea of dilation and smoothing is this, if that was done, then I can much more easily get a good result with lower values for nth sample and min samples while using a high starting value for the NOP threshold (simply letting the engine decide which tiles to work on at what time).

Now you’ve said before that the engine will have to go back and un-NOP the tiles so why not set it very low, well if it reactivates any tile that sees its previous error now above the threshold, then why not do what I just mentioned, the image will work its way to noiselessness anyway.

Also, if you want to know, I’ve been testing a much more complex scene than that simple case I gave you, the average image will have a few regions where the corner case I mentioned could be prevalent. In this, I also would note Sergey’s comments saying how that could potentially lead to artifacts in animation due to visible noise levels changing inside the tile and why the smoothing in the pixel map that drives tile-based nopping could be important.

Also, you talk about measuring observed progress, but can’t you do that at the pixel level as well, or is it too different for that type of thing? Sorry in advance if I’m sounding like a blathering idiot or if this frustrates you.

There is an implementation on the pixel-level “observed progress” here:
It will work just on the “badest” pixels. I assume it performs OK on CPUs. There is no CUDA implementation of it.
The progress-smoothing is missing in that implementation though.

If you want to have a low-noise image in the end, it makes sense to render everything from start using a low-noise threshold.

  • you collect valuable data that you can use at, say, sample 100 onward applying theory
  • no need to go through time-consuming evaluation and stopping/resuming of tiles - just run for 100 samples
  • tile-wise variations in noise will be low by design
  • you are on the safe side, with freedom what to do in the next 2048-100 samples (you might run out of samples, once you activate a region just to realize it is not an easy region)

Try either the v03_rev1 from this thread. It runs a little faster than 2.71RC2 using CUDA (without actually nopping entire tiles).
The noise levels seem pretty even to me.

I dont understand why the bidirectional tracing would not work with adaptivity, maybe you can point me to a thread about it?
Bidirectional tracing is “adaptivity” already by itself, so why not impose another criterium on top of it? So in principle this should work - but not in practise?

I forgot to attach the “typical” plot for variance, sampling a VAR=.01 normal distributed random source.
You see, under 100 samples, no need to apply theory. No matter how much you smooth or dilate.
Neither using even/odd/all samples.

Maybe a hybrid approach (smoothed observed progress … theoretical error bounds) works fine.
Switching point could be 100 samples (better 200).
OR take the worse evaluation of both (at any time).

Here’s the thread on the Luxrender forum.

Read on after a bunch of pages and you’ll read about Dade’s difficulties in getting the noise-aware algorithm to work with bidirectional sampling (because the eye-light connections can pretty much be anywhere in the scene). I’m not even sure how well it would work with tiled rendering in that case, but here you can find why Dade concluded that unidrectional path tracing works better with it.

Anyway, testing Lukas’ new patch revision now (done a few days ago), I could easily see your ver. 2 rev. 8 nopping code working well on top of his adaptive sampling routine, as it actually would not see the issue of noise retention due to the adaptive map concentrating many of the tile’s samples into those areas where they would otherwise occur (and the performance penalty from the true adaptivity in the sampling isn’t actually that high).

OK, the problem is that the samples should be weighted for MLT+adaptivity. Normally samples are just randomly distributed. Once you fiddle with the “randomness”, one cannot apply simple variance calculation. I dont see a possible feedback as well (feedback from MLT to adaptive sampling about sample weight).

They calculate the “progress” by counting changed RGB image pixels. Not sure how that applies to half-float rendering.