Feedback / Development: Filmic, Baby Step to a V2?

troy_s · February 20, 2022, 10:02pm

This is actually a terrific question that deserves attention in its own right.

What I try to have folks ponder is meaning. Negative values in a tristimulus system are meaningless relative to the “observer” they are currently referring to.

That is, if we had a spectral render and then transform it to sRGB, we have two observers at work; one is the spectral observer, which live as an array of tristimulus values out in the CIE standard observer projection, and the other is a projection of say, BT.709. Once we get to BT.709, the negative values are meaningless nonsense. Just as no display can display less than black hole zero, relative to the BT.709 projection, the values are meaningless.

Back in the other observer land, the CIE Standard observer, we could say that those values hold meaning. So the question becomes… can we give those values meaning in the smaller working space of BT.709? If so, how would we do this?

This is where the notions of the observer “footprint” mapping come into play. The math we perform on those values though have to be carefully considered, as again, not all operations hold meaning relative to the BT.709 “observer” if you will.

Likewise, and this idea builds on top of the above concept, we have to be careful on the journey the values take. For example, imagine we are forming an image from the tristimulus open domain BT.709 values to a black and white representation. Those values in the open domain are “inspiring” the image if you will; we don’t want a literal representation, nor can we even achieve that due to the display limitations.

When we think about denoising in this context, we quickly realize that we likely want to denoise the image, not the open domain tristimulus values. What matters in terms of “noise” are the formed image frequencies, not the frequencies that exist in the open domain. The same would apply for “reconstruction” approaches such as “upscaling”. Here if we use “linearized” values, we will find that our resampling falls apart, and that we are trying to reconstruct the image, not the literal tristimulus points in the buffer. In terms of image here, again, we are interested in things like “Where is the middle range of values” relative to an observer observing the image, not the tristimulus values “middle” value, which is of course not perceptual-oriented.

Hence this leads to a rather controversial (sadly) opinion that we need to consider our information states, and that perhaps the “conventional” wisdom of merely dividing things into the rather arbitrary “scene” vs “display” dichotomy is an insufficient model.

I would strongly encourage folks to consider the idea that the image is a discrete image state, that exists in an interstitial place between the open domain tristimulus data, and the closed domain representation medium.

These are some pretty big questions, but one that have had a fair share of arguments about. I’m firmly of the belief that it is prudent to divide our thinking into three categories:

Open domain tristimulus.
Closed domain, formed image tristimulus.
Closed domain image replication / reformulation.

This leads to exactly two positions for operations that may be required:

Open domain tristimulus.
1.1 Open domain tristimulus manipulations. Think of this as manipulating the “virtual light-like” tristimulus values in front of the camera.
Closed domain, formed image tristimulus.
2.2 Closed domain, formed image tristimulus manipulations. Think of this as manipulating the image state, such as denoising there appearance of the image tristimulus, or reconstructing to a higher resolution from the image tristimulus.
Closed domain image replication / reformulation.

Hope that helps to make my sadly controversial opinion clearer. Under this lens, the “negative” dilemma is one that is firmly located at or prior to 1.1, where we would consider those manipulations as “working space dependent”.