Idea for faster GPU rendering using Cycles and AutoTile

GPU / CPU Rendering is, so far mostly automated but with a little tinkering we could slash large amounts of time from a render.

So, could we have a GPU/CPU configuration where you can set an ordered list of your GPU’s, by their speed.


  1. GTX Titan XP
  2. GTX 1070
  3. GTX 1060
  4. CPU

With this ordered list of GPU speeds a pre-render (very very low samples) on the image would be performed by the fastest card and the tile performance data/cache is saved for a beauty render, this could happen on demand, every nn frame or be a complete pre-pass cache set.

The engine would then hand out slow tiles to the fast GPU’s and fast tiles to the slow GPU’s.

This way the Titan would be rendering things like water, glass, volumes etc and the 1060 would be doing the sky, backgrounds, low details areas first, thus maximising per image render speed.

Also an option to not allow a slower GPU to select a demanding tile, ever, thus eliminating the last few tiles going to the slowest cards.


I don’t know how things work but I think that programmers have thought what’s possible or not a million times.

What about the addon ‘Auto Tile Size’?? It comes with Blender and it’s supposed to do that.

O.o So we should not give any feedback or come up with ideas because they may have already thought of it? Hmmmm.

Autotile just calculates a tile size for you based on the image size. It has no performance analysis like this built in.

In fact, the last question that I have asked in BMW thread is referred to this.
My question is about when you have two very different performance cards. About how to calculate tile sizes so that the fastest card/GPU is always responsible for rendering the last tile. Because if the slower card handles the last tile, it slows rendering.
There is also the problem that new cards prefer large tiles as optimal size.

I read your reply which partially led to this post as I suffer from a slower card taking on the last tile all the time now :smiley: I did write a big reply to you and as I was writing it came up with this concept to create a render pre-pass to acquire performance data for each tile to aid the engine in handing out tiles to different GPU’s. It then warranted its own thread.

In the future we will hopefully get distributed single image bucket rendering for networked slaves too so this performance pass on tiles would be very useful in that situation as well.

Yes there is that which could throw a spanner into this concept. I wonder why though. CPU’s for example have always worked on small tile sizes for the final shot. With GPU rendering they realised you could have very fast real time rendered view ports so you get a progressive single tile, It is possible the code was designed to be optimal that way originally and has just been built on since. Would be interesting to know more though.

If you’re rendering animation, probably your best bet is to render with multiple separate instances of Blender, each one dedicated to a specific video card. Enable the Placeholder and No Overwrite checkboxes. This way one card can speed along on frames without being held back by the other. I do this now for my render box with 4 GPUs. They’re all the same, but since each frame takes less than a minute to render, it’s actually faster to put a full frame on a GPU rather than split that frame into tiles.

For stills, you’d have to handle it slightly differently.

Great tip.

No of course not but there coders for Blender development where they work every day 8 hours a day and probably breathe this stuff.

I meant more that our ideas won’t have the perspective of the possibilities with the actual code behind faster rendering.

I personally think blender render is crazy fast and when denoiser becomes available then it’s hallelujah.

I can give one idea but It’s not that good.

What if there is one good blend file for benchmarking (Like it tests everything). Every graphics card and CPU does one render and then it compares how fast it it compared to the other ones.

So for example. GTX 1080 is at 40%, Gtx 1070 is at 25%, 1060 is at 20 and CPU is at 15%. Then it creates tiles and gives the number of tiles to every GPU and CPU based on the percentage.

Thank you.
I did not know about it. Even when I had asked about rendering with CPU and GPU in two instances at the same time I think no one mentioned that.
I will investigate how this works.

Tested. This is really cool for animation :slight_smile:

You can also use the same technique to do network rendering without using Blenders network render, if you know what I mean :smiley:

It’s even cooler (and faster) when your batching from the command line. I wrote a little Gist that documents how I do it… and I’ve recently updated for the changes that’ve been made in the Python API for accessing CUDA devices in 2.78c.