Cycles + GPU + tiles + memory usage

I did some try with a scene with CYCLES + GPU (gtx570) + tiles (different combination of sizes) and memory usage.
I found that if I change the tiles size the memory used from graphic card is always the same (or change only a bit); it means that is not possible rendered big scene. But tiles weren’t inserted to avoid this problem?
Where I’m wrong? Please help me, I searched around but wasn’t able to find information about this.

Christian

Yes, it supposed to relax memory problem and better utilise internal GPU caches. Hard to tell, as we cannot check how exactly NVidia blob work inside GPU. In theory, it must take tile_size*number of threads RAM instead of full final image memory. At least Cycles ask that to Nvidia CUDA library.

I’m quite curious about this too, as I have the same experience as Christian - which is that if you get an out-of-memory error you’re out of luck, and changing the tile size has practically zero effect. It does affect rendering speed though, quite significantly, but no good guide with a thorough explanation exists for either these two short of some trial and error stuff (as far as I know).

Since I’m trying to maintain a FAQ for Cycles hardware, I would very much like to get this absolutely right, so if anyone know why VRAM usage seems unaffected and/or what the exact relationship between tile size and rendering speed is in theory and practice I’d be grateful if they could chip in :slight_smile:

No one can give use more information about this point?

Thanks.
Christian

Hi, I make a test with BMW bench render HD 150% and set tiles to 2208x1242.
This render need 340 MB VRam.
With default setting 200x200 it need 220 MB VRam.
I get the VRam usage from the Nvidia driver with a shell script.
Maybe this info is useful to get an idea about VRam usage with tiled render.

Cheers, mib.

I think it does, but the problem is that the only thing we have is a set of single data points. I’d like to understand exactly why, and how to calculate it. I also know that choosing the wrong x/y ratio can seriously negatively impact rendering speeds (at least it did for me), but I’d like to know exactly why, what formula determines speed and VRAM usage so that I can optimize my settings.

Andrew Price at BlenderGuru did a good job at testing different tile sizes, but he only looked at render speed. I’d like to try to find the sweet spot where I maximize my VRAM and speed. And to do that I need to be able to do a bit of math, but without a formula that is hard…

Hi olesk, I thing best is to catch Brecht on #blendercoders and ask him for clarification for your FAQ for Cycles hardware.
He is sometimes online in the evening hours CET but best time to speak with him is after sunday meetings (or before)
Maybe he is interested to put your FAQ in cycles wiki.

Cheers, mib.

With tiles I can render some of my blend files with GPU. It was not possible in the past without tiles. It’s an appreciated gain of rendertime.

Thanks for the tip mib, I’ll try to see if I can get it directly from the horse’s mouth, so to speak :slight_smile:

I’ll post back here if I manage to dig up something.

Depending on what your rendering, Tiles can only save you a bit of memory.

While rendering, the GPU has to store the geometry, textures, render buffers/output and some code as well (might be more, not a expert really)

For rendering with Cycles, ALL of the geometry and ALL of the textures have to be stored in your GPU’s memory for the whole render.

Not having tiles means the whole output of the frame your rendering (passes etc…) has to be stored in the GPU which could be a 19201080 image, and with 5 passes that can take up a decent bit of memory, while if your using 256256 tiles, only each tile has to be stored.

If you have 600mb of textures, 1.1GB of geometry, and all passes used for 300Mb (no tiles) = 2GB used, but you OS might use a bit extra as well. This situation would Out Of Memory error if rendered with a 2GB GPU. Using tiles could reduce the buffer memory used to 30mb (hypothetically) thus using 1.73GB, and not crashing.

However, tiles won’t help all OOM errors if your rendering a huge scene, that is over your GPU’s memory by far.

Very interesting. So, if I read you correctly, tiles are really mostly significant (from a RAM perspective) on limiting the size of the output buffer which is really a rather small part of the total RAM usage anyways. Hence decreasing tile size can help a bit, but not really significantly decrease the total RAM used by the GPU.

That still leaves the impact on speed though. I would intuitively assume that the bigger the better, with regards to rendering speed, as cycles would have less overhead the larger the tiles. However, it seems this relationship is not that clear. I read on the cycles developers’ mailinglist, that ideally you would want tiles that consists of a factor of 16 pixels (read: if your tile size gives an integer and not a float when you divide it by 16) as this helps optimization. But beyond that, I couldn’t find much.

I’ve only recently begun seriously toying around with blender so my insights aren’t too terribly detailed.

I was actually playing around with tile size to see what effect it has on memory usage this morning when i noticed this thread. My experience is right along with what zeealpal explained.

Regardless of tile sized there is some information blender is loading completely (Geometry, textures). Only a portion of the memory reported by the render is affected by tile size. I assumed that was based on the buffer used while running through the samples on the current tile but haven’t been able to find any concrete documentation on the topic.

Olesk, If youneed any help running some tests to get comparisons rendering a file with specific tile sizes I would be happy to assist. Just send me a PM.

I have two systems at my house capable of running cycles.

System 1: i7 3770k (quad), 16GB Ram, GTX 660 Ti 2GB
System 2: Core 2 Duo E8400 (dual), 4GB Ram, GTX 550 Ti 1GB

Thanks for the input! I actually have a few different GPUs (260/580s/590 and 680) that I could use to benchmark as well. At this point though, my primary concern is rather to understand the underlying mechanics behind this. There are several practical benchmark test floating around already, but I’d like to start in the other end. I.e. first be able to explain exactly why tile sizes impact VRAM and speed they way they do (and how much), and then go from there to giving some sound advice on how to optimize this.

I’ve made a first draft already, but I am not comfortable that what I have written is actually 100% correct (it is very much a first draft at this point). If anyone can point out mistakes or omissions, I’d be really grateful! :slight_smile:

Later though, I think powerst suggestion of organizing a bit of benchmarking makes very much sense.

Ok, so thanks to Brecht van Lommel’s very kind assistance, I think I’ve nailed it (to the extent that it really can be “nailed” considering the actual complexity of the issue). Suffice to say, it was a lot more complex than I first imagined when I started out trying to figure out the ideal tiles size for RAM and speed.

I’ve written up the whole thing in the FAQ. Take a look at this entry for the full story. I wrote the whole thing in one go while everything was still fresh in my mind, so if something doesn’t make sense I’d be very grateful for feedback.

If some thinks it should also be posted here or other places, feel free to copy & paste whatever you’d like.

Thanks for doing that Olesk. That’s some good information to have all in one place.

Was very clear to me. Only minor critique I have is the misspelling of ‘finish’ as ‘finnish’ in a couple places. Other than that, a great FAQ. Confirmed some things I had assumed until now and taught me a couple new things.

Argh! I can’t believe I haven’t managed to stamp out my “finnish” typos yet - I seem to have a mental block on that one (and on “nessecary”/“nessasary”/“necesary”/“necessary”/whatever-that-word-is) :wink:

I’m glad it was clear to you - it’s always a bit tricky when trying to condense a lot of technical detail in your head to something that makes sense to people reading it the first time.

As an afterthought I should perhaps confess that I’m a little disappointed that tile sizes are not more important. After all, unless you have a special situation (borderline RAM or multi-GPU), really any tile size over 128 will mostly do the trick nicely. I thought I had a real scoop on my hands when Brecht was kind enough to answer my e-mails :wink:

Yay, nice to see my understanding was basically correct :slight_smile: Nice write up btw :slight_smile:

Thanks :slight_smile:

I have to confess I used your post to help frame the questions to Brecht, as it sounded pretty reasonable to me (and made me sound pretty clever I think) :wink: