Cycles much worse performance with RTX cards

I gave that scene a quick spin in the profiling tools, and it does indicate that my RTX card is spending most of its time idling, the CUDA cores are waiting for instructions to be loaded from memory. In other words, that scene is too simple for Cycles’ code and the GPU is slow at skipping over unused code.

Nvidia writes that their GPUs have instruction caches but does not disclose any details - it may be that those changed between the10x0 and the 20x0 cards and that the 10x0 series was just a lot better at dealing with Cycles’ megakernel.

Either way, a simple workaround is available if you are rendering on Linux. Make sure you have the CUDA SDK installed, then enable the Debug panel and check “Adaptive Compile” under CUDA flags. Blender will then compile a Cycles kernel on demand with only the features that the scene is using. In the case of my RTX 4000, this brings render time on that scene from over 6 minutes down to 1 minute and 45 seconds.

4 Likes