Brecht's easter egg surprise: Modernizing shading and rendering

OpenCL 2.0 release

Updates and additions to OpenCL 2.0 include:

  • Shared Virtual Memory
    Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.
  • Dynamic Parallelism
    Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks.
  • Generic Address Space
    Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application.
  • Images
    Improved image support including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop.

Dumb question, can someone tell me if this means sharing/merging GPU VRAM with RAM? I mean, possibility to use all the memory one has on his machine, but still rendering over GPU?

Yes, it does mean that. It also means a massively more convenient way to do OpenCL programming. However, there’s a reason we don’t have this already: Memory sharing is not possible to do efficiently on most current hardware (except on GPU/CPU combos). It remains to be seen whether this actually gets implemented at all for current dedicated GPUs - and if so, how efficient it will turn out to be.

Dumb question, can someone tell me if this means sharing/merging GPU VRAM with RAM? I mean, possibility to use all the memory one has on his machine, but still rendering over GPU?

AFAIK Yes, but you’ll need a hardware that support it.

More info about OpenCL 2.0:

Thanks guys, nothing near with our current machines, got it. And doesn’t help with running Cycles on OpenCL, fully feature Cycles i mean, does it?

Thanks

Any information when first OpenCL 2.0 hardware and driver will appear? Already with the next generation of hardware around end 2013/early 2014?
It sounds interessting but there could be several bottlenecks for rendering. DDR3 RAM is way slower compared to GDDR5 VRAM and the PCIe slot has high latencys as far as I know

I don’t see a fundamental limitation why Cycles shouldn’t already run fully-featured, except maybe for the texture limit (which could be worked around, as well). Actual driver implementations and efficiency issues are the real problem.

Any information when first OpenCL 2.0 hardware and driver will appear? Already with the next generation of hardware around end 2013/early 2014?

Intel is probably going to support it with their Haswell GPUs, AMD is likely to follow suit with their CPU/GPU combos.

It sounds interessting but there could be several bottlenecks for rendering. DDR3 RAM is way slower compared to GDDR5 VRAM and the PCIe slot has high latencys as far as I know

For DDR3, you can put some EDRAM cache on-die, which is what Intel is doing for their Crystalwell chips. The Xbox One also has EDRAM/DDR3 RAM. The PCIe bottleneck is just going to be a reality that we have to deal with on the PC. For convenience and portability, it would still be worth having a driver-level virtual memory system, even if it will be much less efficient.

I know about the on-die EDRAM Intel uses for their premium chips but this isn’t relevant for rendering.
But in 3 years we don’t need shared memory anyway. NVidia already announced stacked RAM for their “Volta” GPUs. Maybe we get 32GB+ on consumer cards then :smiley:

Aren’t ‘aprupt changes in sample intensity’ as Ace Dragon phrased it, exactly what is to be expected in the case of caustics?

What I mean is abrupt changes in sample intensity in regions where the intensity would be the same or be more smooth once everything is converged.

Seeing these lines of intensity changes in these cases can mean that the sample distribution is less even or more clumped than it can be.

Both images use the ‘filter glossy’ option as well (due to it’s ability to really bring out the caustic effects in Cycles at the cost of some light blurring or bias).

Of course it is relevant - why shouldn’t it be?

But in 3 years we don’t need shared memory anyway. NVidia already announced stacked RAM for their “Volta” GPUs. Maybe we get 32GB+ on consumer cards then :smiley:

I want shared memory for all kinds of reasons. There’s still things that GPUs aren’t particularly good at that CPUs could do better, but it’s pointless if there’s a PCIe bottleneck. We also might get systems with GDDR system ram, maybe with dedicated replacable GPUs. Unified memory is definitely the way to go, considering also that both of the new consoles have it.
Within 3 years, you might get 8-12GB consumer cards as a standard, but 32GB is highly doubtful. Games will not need/use that kind of memory in the next generation.

or may be that it converges faster to the final caustic draw, which will be less diffuse then it may seem at first sight. Just a thought.

Anyway I think is wise to judge differences when the render is actually finished, then you can see what/where it converges or not and look at the time.

paolo

Initial support for SSS render passes has now been committed to the DingTo branch.
http://lists.blender.org/pipermail/bf-blender-cvs/2013-July/057738.html

Not everything is resolved yet, that will require input from Brecht and Stuart


or may be that it converges faster to the final caustic draw, which will be less diffuse then it may seem at first sight. Just a thought

In this case, the abrupt change is in an area that should be more even once converged given the shape of the glass object.

Also, I rendered out the same scene (same no. of samples), with the old Sobol filter and the difference is even more stark in some areas.

Multi-Jitter (after the recent fix).


Sobol.


Open both in their own tab and flip between them, at first glance they might seem similar, but there’s a clear superiority in terms of the number of brighter caustic samples (and the fullness of the caustic pattern) in the glass object’s shadow in the Multi-Jitter image than the Sobol one. (the other improvements in sample count and distribution in other areas caused by the fix also applies as differences here).

58858

Cycles SSS render pass now in trunk, by DingTo

DingTo posted in the Cycles thread a link to an article about renderers and there is a link to a Disney paper on SSS very interesting

We will probably implement this one for SSS: https://sites.google.com/site/ckulla/home/bssrdf-importance-sampling
Let’s see. :slight_smile:

Yes, I was just now reading Marcos Fajardo words:

Fajardo explains the difference. “When you are doing SSS you can work at two levels. You can just use SSS to simulate what we call the first bounce – under the surface – that is single scattering. It is an easier and well defined problem. That is what they did on Spider-Man. We along with Sony helped develop single bounce scattering more efficiently with GI. Now we are talking about multiple scattering, this is what gives you the softness and bleeding of light. That is a lot more difficult, and that is only now possible now that people are starting to do this with ray tracing. Up to now you really needed to use point clouds and it was painful. This year at SIGGRAPH we are presenting a way to totally do away with point clouds. I am so happy that we are putting the final nail in the coffin of point clouds. I cant even tell you! For many years that has been the last place you needed point clouds. A few people have been trying to do multiple scattering with ray tracing and we touch on this in our talk, but it was not very efficient, we use a new importance sampling technique for sub surface scattering, what we call BSSRDF Importance Sampling.”

Arnold was one of the first renderers to deploy MIS back when Fajardo worked at Sony Pictures. Today the implementation at Solid Angle is quite advanced, more than just using it for BRDF and lights. It is “applied to many many places in the renderer, virtually any place in the renderer where there is an integral you can apply IS – and there are many integrals in a renderer,” says Fajardo. If you are smart enough to find multiple samplers, most of the time people find just one sampler or method, but if you are smart enough you can find multiple samples for the same task and then combine them.” It is for example used for SSS in Arnold.

A possible quick project for DingTo.

Someone on the Luxrender forums posted some tests of the use of a new filter type called Blackman-Harris.
http://www.luxrender.net/forum/viewtopic.php?f=8&t=10381

It doesn’t involve that much code, but direct comparisons of the filter looks to show a slight edge over guassian in sharpness, and an advantage over the popular Mitchell filter in terms of smearing (it also doesn’t have any of the artifacts associated with it as well). Generally, the thing here would be to have the size default to 3x3 as that seems to be the sweet spot for that algorithm.

What do you think, are the Lux users onto something here?

Tab compare…

Guassian
http://www.luxrender.net/forum/download/file.php?id=19422&mode=view

Blackman
http://www.luxrender.net/forum/download/file.php?id=19425&mode=view

Mitchell
http://www.luxrender.net/forum/download/file.php?id=19420&mode=view

Ha. Funny that you should mention that. I just implemented 5 new filter types in Cycles today (Mitchell-Netravali, Triangle, a new Gaussian, Cat-Rom, and Lanczos), I’ll look into B-H tomorrow as well. I know Arnold has it as well, but no one ever uses it as far as I know. Most people just stick with Gaussian: 2 when rendering. More options can’t hurt though! We’ll see how many of mine make it in, I think my Mitchell filter may be messed up, but only because I don’t get the dark ringing around high contrast areas.

EDIT: Just spoke with Brecht. Cycles uses an interesting filtering method

http://lgdv.cs.fau.de/publications/publication/Pub.2006.tech.IMMD.IMMD9.filter/

seen there, that I wasn’t at all familiar with. It disallows negative lobes and can’t sample across tiles, but has faster lookup times than traditional techniques, especially with small tile sized on the CPU. This helps to explain why 16x16 tiles is very fast in Cycles while it would absolutely kill performance in Arnold where traditional padded sampling is used.

Anyway, that puts Mit-Net, Cat-Rom, and Sinc filters out the window as they all rely on neighboring pixels and negative lobes. Blackman-Harris should still work though, so I’ll take a go at adding that shortly.

As far as filters go, I don’t think there’s a need for all 5 of those plus the Blackwell filter, what we need is a focus on committing just the ones that produce the highest quality in terms of reducing aliasing, avoiding artifacts and preserving sharpness and detail on things like thin lines.

Generally, perhaps it would be good to remove the old box filter as well, because if Cycles is going to have filters known to be superior and if there’s barely anyone who uses it nowadays, why leave it in?

Most renderers aim for filter completeness. Filters are like a religion to some people, they swear by a certain one. Each has its positives and negatives. It can’t hurt to have more options in the long run. Default to Gaussian, but have the others there for people who want them.

Yep, and each filter is useful in different situations too.