How much does the GPU memory limit matter in 2.9?

From the Blender manual:

“With CUDA and OptiX devices, if the GPU memory is full Blender will automatically try to use system memory. This has a performance impact, but will usually still result in a faster render than using CPU rendering.”

Does this mean that unless we max out the gpu + system memory there won’t be any out of memory errors anymore?

I did some tests in 2.9, with high resolution 16 bit textures, using GPU Optix rendering, on a RTX super with 8gb of ram.

I did not see a significant performance drop in rendering high res textures when staying within the GPU memory limit (7’767mb of ram) or exceeding it (14’561mb of ram) thus requiring the system memory to kick in.

I can also render highly subdivided meshes with an 18gb memory usage, using the GPU and Optix:

So what are the actual limitations of exceeding the GPU memory limit, since Blender then relies on the system memory? Is the performance drop high enough to justify spending that much more on higher VRAM cards?

thank you

2 Likes

Very good question, someone knows a bit more about this?

Hi

At this point, the patch is set to use not more than 1/2 of the system memory as rendering memory.

Out of curiosity, how are you measuring vRAM usage? You must use an external program for it, like GPU-z

In my tests some time ago, when out of vRAM came from meshes (not from textures), there was a significant impact on render times.

Hello, interesting link thank you.

Also worth noting from your link:
An obvious limitation here is that the 1/2 heuristic only works well with a single device, with multiple CUDA devices trying to allocate that much memory, it could run into trouble.

So if I understand this correctly, it might run into issues with multi gpu setups?

Your link is a post from 2016 concerning CUDA.
Is it still relevant for the latest Blender version? And does it apply to Optix?

I did not measure vRAM usage.
I set my render device to GPU OptiX. (settings in the second image in my original post)
I then duplicated highly subdivided meshes, until in the image editor the memory was listed way above the 8gb memory of the GPU. In my example, memory is : 18.72gb. (first picture in my post)

This way I assumed I exceeded the memory capacity of the 8gb GPU, and had to utilize the system memory for rendering. I just wanted to test out of core rendering. Something wrong with my test?

That’s interesting.
I’d like to run my own tests but I’m not sure how to proceed. How can we compare out of core gpu rendering times to gpu only rendering times, since we can’t directly compare the two?

I am not sure if it has already been addressed. Maybe @skw (Stefan Werner) can answer that if he has some time available.

I don’t quite remember how I had done my tests. I think I did the following: Open an external vRAM monitor (GPU-z or watch -n 0.5 nvidia-smi on linux). Create a scene that takes up about 75% vRAM while rendering with Cycles. Open an instance of Blender with the scene and set the viewport render preview, wait for it to finish rendering in viewport, therefore it will occupy vRAM but GPU utilization will be negligible. While the other instance of Blender is kept open in render preview mode, at the same time open another instance of Blender with the same scene and Render image to measure the time. I know this is not a fancy way. Surely there is a more elegant method of intentionally filling vRAM to say 75% without GPU utilization.

Edit:
For anyone interested in intentionally filling the vRAM with a certain size, I have found this (it needs nvcc to be compiled)