Upcoming MAJOR memory optimization for CUDA GPUs


(Wegg) #1

There is a new feature coming to Blender’s Cycles that will allow Cycles to re-compile an optimized version of itself to only use what your scene is needing. Why would you want this? Well in my testing, this optimization reduces the amount of V-Ram needed to render any given scene by a HUGE amount.

/uploads/default/original/4X/a/b/f/abf4f24a408c1c7021599d733010e81ceda8b124.pngstc=1/uploads/default/original/4X/3/0/c/30c3509022eea35b6c6312286fd4db12a2b00822.pngstc=1

For example, the BMW27.blend test scene normally consumes around 2.6 Gigs or Vram. With the new Adaptive Compile feature, that memory footprint goes all the way down to just 600 megs. This shaves off 2 whole gigabytes! Now nvidia cards with much smaller amounts of V-Ram can mix it up with the big boys!

If you want to try for yourself, I have more detailed instructionshere.

It is currently limited to Linux and it is quite fiddly to set up but I’m pretty excited about this optimization. Obviously :slight_smile:

Attachments




(English is not my native language) #2

Hey, good to know.
In the Blender 2.7x development thread I had asked if this feature could be useful to avoid unnecessary Kernel features in people with old cards without need to editing the source code and compile Blender. Since no one answered I assumed that maybe it was not for that. But yes, nice


(Wegg) #3

You will for sure be able to use older cards now. I have a pair of GTX 580s that I am bringing out of retirement because of this. It is still pretty rough in implementation but I think eventually, when this becomes a default option, it will allow GPU rendering to be available to a much wider range of cards that was possible previously.


(English is not my native language) #4

A question. Have you compiled Blender yourself, or you use Buildbot version as you say on your website?
Using Buildbot version I had to rename the ‘/blender-2.77-4adffde-linux-glibc219-x86_64/2.77/scripts/addons/cycles/lib’ folder to Blender can compile new CUDA Kernel instead of using precompiled CUDA kernel. It seems that it is working!


(doublebishop) #5

Yeah Volumetrics & hair & subsurface scattering chew up ALOT in terms of kernel size and overhead during render… we have been manually disabling them and recompiling the cycles kernel for about a year or two now and its made significant difference.


(Wegg) #6

I’m only downloading the pre-compiled version. This new feature gives us the ability to create a “custom” compiled version of blender seamlessly without me having to know what I do and don’t need.


(Spirou4D) #7

Yes vey good to know that! Thks.:cool:

Very good your method to use avconv as video screen capture! Thks.

EDIT Hello YAFU!


(Joel_nl) #8

So… is this the split-kernel work that AMD did, or something else?
Where can i find more info about this, because it looks awesome :smiley:


(LazyCoder) #9

Whoa! That’s incredible. What exactly is being compressed here, though? Where are these savings coming from? Is it just textures somehow being compressed more efficiently?


(Joel_nl) #10

If you read the initial post, you could have scene that this compiles a new render kernel to one that only has the features that are used in the scene you will be rendering at that moment.
So… if you have a scene with lots of geometry but no Volumetrics, hair or SSS. these features will be disabled giving you a much lighter version.
Because all those features are dissabled, it’ll save quite some space on your GPU (giving you more room for textures/geometry etc.)


(Esparadrapo) #11

I dunno what you’re doing with the BMW 27 benchmark but it only uses 1.4 GB for me on a single card.


(burnin) #12

Looks good and as such a proper working system - carrying around all unnecessary weight is plain stupid.


(English is not my native language) #13

@Wegg. Using Buildbot if I do not rename that folder that contains precompiled CUDA Kernel then Blender just not compile new kernel and uses precompiled kernel.

What is your GPU? My GTX 960 uses 1029 MiB with BMW27.blend scene.
Perhaps this is related to what Brecht says here?
https://developer.blender.org/T46528

GTX 980 Ti has 2816 CUDA cores. GTX 960 has 1024 CUDA cores. Using adaptative Kernel BMW27.blend takes 309MiB in my GTX 960

Edit:
Hi Spirou4D! :slight_smile:


(Oyster) #14

good news. But will it be in official blender? will it support old GPU such as Geforce 740M?


(Wegg) #15

Ahh. Yes. . . I should put that in my instructions.

I have two video cards in my system. The GTX 970 had the scene loaded and the 980ti was the one that was doing the rendering. So if you only have one card in your system you would have to merge both the memory it takes to view the 3D model in OpenGL AND to render it.


(English is not my native language) #16

You wait until the next buildbot version because it seems that’s not going to be necessary rename that folder:
https://developer.blender.org/rBdedc9950188dc71a3a89d62f3f15d98d0adfc511


(AustinC) #17

Last time I read, each GPU needs to load the entire scene into its VRAM. From what I think I know about the GPU architecture, a GPU must have the scene loaded into its VRAM to render it, and a GPU can not render a scene stored in RAM from somewhere else in the system, such as another GPU’s VRAM. And, if a GPU is able to cache a scene somewhere else and perform operations on small chunks of the scene at a time (by loading a portion into its own VRAM at a time), then it would make more sense if it just cached the scene in the PC’s RAM.


(Lane) #18

Its pretty nice that you could set Cuda for use less Vram for do the same . But need a bit more test for check the results


(cekuhnen) #19

Thats fantastic !!! I deal with heavy scenes and while with my cards I did not hit the limit this still great to know!


(LazyDodo) #20

Ran a test on windows with a gtx670 (needs a small patch, won’t work out of the box, also needs the cuda toolkit and visual studio installed, so not sure how useful it’ll end up being for non-devs)

with the bmw 27 benchmark, readings from gpu-z

just blender open= 470 MiB
standard kernel= 1127 MiB
adaptive compile= 668 MiB
Savings = 459 MiB