Suggestion for workaround for cycles texture limit

My scenes really use a lot of textures, and I recently hit the 95 texture limit of cycles. Then I started thinking if it would be possible to work around this, and I think I have a solution. However, I’m not very familiar with the C-code of cycles, so I have trouble figuring out which parts of the code would be involved, and into what extent things can be solved in python and where C is needed.
I’m hoping someone more familiar with this code is willing to tell me whether this solution is feasible, and is willing to help me understand the concerned code.

Requirements for the solution

  • amount of textures should not be the limit, only the size of your graphics memory
  • typical UV tricks that are supported now (tiling textures, textures outside of [0,1] range) should be still supported
  • solution should be transparent (invisible) to the user

Suggested approach

  • before rendering starts,textures which have the same size and pixel-type are grouped together in an atlas.
  • The concerned image nodes are notified that they have to re-map to an area of that atlas.
  • When a node has to sample the texture, it will use the remapping info to sample the right area. border-cases should be handled correctly.
  • When rendering a movie, the same atlas can be re-used for rendering the whole movie in this way.

Areas of concern

  • The design of the node system will probably get more complicated: some manager needs to manage all the texture nodes
  • performance - the re-mapping is not a difficult calculation, but will happen a lot (every time that one of the textures is sampled) so the code needs to be written with care
  • limits - is there a limit to the texture sizes?
  • Graphics pipeline features - does the cuda code use typical graphics-hardware features like mipmapping or texture border handling, which could be messed up when playing with the sampling coordinates?
Which parts of the code are involved?
  • image texture management: C or python?
  • image sampling: C? cuda?

Hi Jonim8or
The limitation is actually a Cuda thing, all Cuda cards up to compute capability 2.x (Fermin Chips) are limited to 128 texture slots. Cycles divides them to 95 LDR images, 5 HDR images and 28 for internal use (I dont know for what).
Newer Nvidia chips with higer compute capability (currently 3 and 3.5) have a fixed limit of 256 that probably allows 223 LDR images

Source: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities

CPU is also limited in cycles with i think 512 textures. They had to raise it for the tears of steel movie project but technically there is no limit on CPU.
The limits are defined as much as i know in: intern/cycles/kernel/kernel_textures.h

The only workaround is to reduce the amount of textures by reuse textures or merge a lot of small textures into one big texture (could be done with a script maybe). It would be cool if cycles would internaly automerge textures with correct uv cordinates for gpu render.

I’ve been looking around in the code a bit, to see where changes should be made to implement this. Until now I’ve found out the following:

Modify the node-to-cycles compiler? (don’t know if this will be needed)
void ImageTextureNode::compile(SVMCompiler& compiler)https://svn.blender.org/svnroot/bf-blender/trunk/blender/intern/cycles/render/nodes.cpp
Modify the internal workings of the node so that it will convert the sampling coords to sampling coords in the atlas:
__device float4 svm_image_texture(KernelGlobals *kg, int id, float x, float y, uint srgb, uint use_alpha)https://svn.blender.org/svnroot/bf-blender/trunk/blender/intern/cycles/kernel/svm/svm_image.h
Here the image id is translated to a texture which is sampled, so I guess here the magic indexing/sampling should be done. However I don’t know how to get the required data here.

Another point of interest is that the cuda uses tex2D (svm_image_texture uses kernel_tex_image_interp which is translated to tex2D using a macro). What kind of filtering is used by CUDA? because that might make the atlas a lot more complex and less efficient.