optimized cuda builds

Hi all,
NVidia gave me a 1080Ti to work on CUDA too some weeks ago. Here are the first results and the corresponding build for 64bit windows:

Last crowdfunding was a good experience, but didn’t permit to live from it. So I will most certainly accept a job offer for a private company. What will be open then will be up to them. Seeing what people buy for hardware here and how many are downloading the builds I do, I think there is really the possibility for the community to pay devs like me so that you get the speedups and new features. My patches together allow to buy a 1060 for 190€ instead of a 1070 for 385€ and get about the same performance. If you split the spare money between dev fund and you, you get 100€ for what you want and 100€ for the dev. Every 30 people doing so brings one month of paid development! So in the end you get instantly same performance, better one on long term with the development and more money. Even if it’s all a half of what I said, you get a better deal balancing your investment between fost and hardware than when you put all the money in hardware.

My 2 cent. Let me know how it works for you with this build. If it’s my last open contribution for a long time, thanks to all the donators again, you do a great job!


What exactly is “Optimized” about this

The cubins. The code is the same as master, but I used a better compiler.

Hello bliblubli!

I tested your build in the mib2berlin benchmark scene and got worst result than buildbot one.

Yours: 5m30s
Against 4m30s from buildbot

I’m using GTX 1050ti here… I’ll test it in other scenes later such as that ones you mentioned in your tests and post here the results.

Yes, same here, this scene is indeed slower, but I prefer to render this scene slower and production one faster. Everyone can choose what he prefers.

Tested it in the Victor demo scene (made some changes to reduce the memory required to fit gpu 4gb limit: limited texture size to 1k, reduced hair count, etc.) from Blender.org’s site, got almost the same result as before… 30sec slower, maybe because my gpu is not a high end one?

I only have a 1080Ti to test, would be interesting to know which one you have and what the results are in the BMW, classroom and barcelona. Maybe the speedup is really for 1080Ti only.

Just tested the full victor scene at lower sample. It renders in 13:23 with master and 12:09 with the build above. I’m not sure why. Maybe only the kernel_sm_61.cubin is faster?

Developer meeting notes mentioned BI will be hiring 2 more developers. Maybe they could hire you?

Thanks for the suggestion, but I’m already speaking with another company. My point was more about the community that could get much more if the funding/donating culture was more spread among users. I tried to show that it would be smarter for them, also from a financial point of view.

Where can i find more information about this ‘better compiler’ ?

It’s only for people who give benchmark results :stuck_out_tongue: At the moment, either only people having problem take time to respond or I’m the only one to have speedups with this build, which I don’t believe.

For those who are wondering how to get this on other systems… download the 7z file, and go into the 2.79/scripts/addons/cycles/lib folder and copy those cubin files into the same directory on a similar blender build for your OS. cubin files are cross platform.

We regularly use cubins built on linux, on win7 machines.

Thanks for sharing your build!
I tried a scene I have (head shot, SSS and HDRI lighting): exact same render times. 06m38s with the latest buildbot and with your build.

Could you make one with the latest CPU+GPU render?
Also, the Carve booleans are crashing with your build (MSVC 2015 compiler issue: https://developer.blender.org/T51540).

Then I wonder many things. If it’s already known for a long time, why isn’t it done by default? Buildbots could be made to only build cubins on Linux (would be faster) and then just copy them for OSx and Windows.
The second thing is, I only had one file which was slightly slower with those cubins (synthetic bench from Yafu and mib2berlin) and all other were near the 2 digit speedup. So why do I only get reports from people who don’t see the speedup? I guess if you use it in prod, which I understand, it means not only the 1080Ti takes advantage of it.

Ran a couple of tests, only experienced around a 5% decrease in render times. Not perfect for me, but still better I guess. :slight_smile:

yup, had noticed both
- speed ups on generic scenes built without any attention to detail*
- slow downs on intricate scenes built with attention to detail* in mind

noteby *attention to detail i mean working intelligently, creating a virtual scene using tricks, bypasses, applying optimized and efficient techniques

simple conclusion
good for beginners - bad for advanced; & is also harder to advance, since beginner doesn’t get a chance to be confronted with solving problems