Quadro/Tesla/Titan faster on Linux?

I noticed that my FLIP sims in Houdini saw a significant speed increase when using OpenCL on my M4000 under Linux. This seemed strange, and understandably a lot of people didn’t believe me when I said that I was getting 50-100% faster results over CPU on Linux whereas I only saw marginal improvement on Windows, and only with a huge number of particles. Even then, it wasn’t really worth using, and just chocked it up to having paid too much for my GPU (which I probably did, but I have other reasons too). I pretty much summed it up to being scene-specific, yet I kept seeing the same performance on other simulations as well.

Disappointingly, the same drastic improvement wasn’t seen using CUDA. Then someone at odforce found this from Redshift that explains the situation:

One important difference between GTX GPUs and Titan/Quadro/Tesla GPUs is TCC driver availability. TCC means “Tesla Compute Cluster”. It is a special driver developed by NVidia for Windows. It bypasses the Windows Display Driver Model (WDDM) and allows the GPU to communicate with the CPU at greater speeds. The drawback of TCC is that, when you enable it, the GPU becomes ‘invisible’ to Windows and 3d apps (such as Maya, Houdini, etc) and becomes exclusive to CUDA applications, like Redshift. Only Quadros, Teslas and Titan GPUs can enable TCC. The GeForce GTX cards cannot use it. As mentioned above, TCC is only useful for Windows. Linux doesn’t need it because the Linux display driver doesn’t suffer from latencies typically associated with WDDM. In other words, the CPU-GPU communication on Linux is, by default, faster than on Windows (with WDDM) across all NVidia GPUs, be it GTX cards or Quadro/Tesla/Titan.

 So I'm interested on people's thoughts here and how it would apply to Blender. Would Rendering OpenCL be a better choice on professional Nvidia cards (i'm thinking not, since if I am reading it right this feature is already used in CUDA)? As Blender continues to implement OpenCL will we see the same kind of disparity between professional and consumer cards?

I should probably download Lux…

Well, other people are also recommended to use Linux for other programs like Houdini:

Anyway in Cycles/CUDA you do not draw hastily conclusions, there have been some problems before in Windows 10. But don’t know about OpenCL.

The issue you’re quoting there is going to affect OpenCL, as well. CPU-GPU communication in Cycles is minimal during rendering (mostly image updates), so driver overhead should have a relatively low impact, especially when rendering from the commandline. Some people have reported speedups under Linux while others couldn’t reproduce it (see the link YAFU posted).

I am seeing some render performance on linux over windows, but not as I am seeing in Houdini FLIP which is pretty significant. For cycles, CPU and GPU perform about the same and both are a bit faster than windows, but neither is significant faster than the other in my case.