“With CUDA and OptiX devices, if the GPU memory is full Blender will automatically try to use system memory. This has a performance impact, but will usually still result in a faster render than using CPU rendering.”
Does this mean that unless we max out the gpu + system memory there won’t be any out of memory errors anymore?
I did some tests in 2.9, with high resolution 16 bit textures, using GPU Optix rendering, on a RTX super with 8gb of ram.
I did not see a significant performance drop in rendering high res textures when staying within the GPU memory limit (7’767mb of ram) or exceeding it (14’561mb of ram) thus requiring the system memory to kick in.
I can also render highly subdivided meshes with an 18gb memory usage, using the GPU and Optix:
So what are the actual limitations of exceeding the GPU memory limit, since Blender then relies on the system memory? Is the performance drop high enough to justify spending that much more on higher VRAM cards?
Also worth noting from your link:
“An obvious limitation here is that the 1/2 heuristic only works well with a single device, with multiple CUDA devices trying to allocate that much memory, it could run into trouble.”
So if I understand this correctly, it might run into issues with multi gpu setups?
Your link is a post from 2016 concerning CUDA.
Is it still relevant for the latest Blender version? And does it apply to Optix?
I did not measure vRAM usage.
I set my render device to GPU OptiX. (settings in the second image in my original post)
I then duplicated highly subdivided meshes, until in the image editor the memory was listed way above the 8gb memory of the GPU. In my example, memory is : 18.72gb. (first picture in my post)
This way I assumed I exceeded the memory capacity of the 8gb GPU, and had to utilize the system memory for rendering. I just wanted to test out of core rendering. Something wrong with my test?
That’s interesting.
I’d like to run my own tests but I’m not sure how to proceed. How can we compare out of core gpu rendering times to gpu only rendering times, since we can’t directly compare the two?
I am not sure if it has already been addressed. Maybe @skw (Stefan Werner) can answer that if he has some time available.
I don’t quite remember how I had done my tests. I think I did the following: Open an external vRAM monitor (GPU-z or watch -n 0.5 nvidia-smi on linux). Create a scene that takes up about 75% vRAM while rendering with Cycles. Open an instance of Blender with the scene and set the viewport render preview, wait for it to finish rendering in viewport, therefore it will occupy vRAM but GPU utilization will be negligible. While the other instance of Blender is kept open in render preview mode, at the same time open another instance of Blender with the same scene and Render image to measure the time. I know this is not a fancy way. Surely there is a more elegant method of intentionally filling vRAM to say 75% without GPU utilization.
Edit:
For anyone interested in intentionally filling the vRAM with a certain size, I have found this (it needs nvcc to be compiled)
Hi, just a question about this. I’m using blender 2.92 and run out of gpu memory quick, because of massive 16bit textures (which i need for gradients and large renders).
I have a 3060ti 8gig. I thought blender has out of core tech. Doesn’t seem to work though.
I have exactly the same issue in 2.93, I constantly run out of memory on my 2080 super (8GB) and it seems that out-of-core simply does not kick in. The system has 32GB of Ram so i believe it should not be an issue to render a scene, but no matter which settings I use, the render crashes or prints out the message of running out of core. Did you ever find a solution?