Lets speed up blender

Hi there,

I just ran a comparative test render with cycles on two different builds of blender r56315.
One compiled with VC 2008 and one compiled with VC 2012.

Simple scene with a high detail character

VC 2008: 18 Minutes 33 seconds
VC 2012: 16 Minutes 06 seconds
thats about 14% speed up

All on CPU, no CUDA yet.
I’d really like a benchmark for blender in order to compare physics, rendering and other features. Is there any?

By the way, the builds are both x64, the vs 2012 build was done with /arch:AVX and GHOST SDL instead of GHOST Win32 because of the odd crashing issue on Win 8 this isn’t optimal and brings a nasty cursor loading bug, but that’s just a matter of time. Most features seem to work perfectly.

Both builds and the prerequisite libs can be downloaded at my site: http://shadowrom.de

I would love to test your builds…
But the AVX build file is broken (cannot be extracted with winzip or 7zip)
I then tried the noAVX and got 2900 errors while extracting the file and of course Blender crashed when I tried to run it.

I’ve been running Gentoo on my FX 8350 @ 4.9ghz and GCC 4.7.2 with the following CFLAGS:

-march=bdver2 -Ofast -pipe -mprefer-avx128 -minline-all-stringops -fopenmp -fno-tree-pre -ftree-vectorize

It is a little over twice as fast as it is in Windows with the official build. I do the BMW benchmark in about 1:50.

I think if you want to make Blender fly you need to drop Windows and head over to Linux and compile yourself. I use cmake and it’s not too bad once you get everything set up.

My build will only work if you have FX 8300, FX 8320, FX 8350, FX 6300, FX 6350, FX 4300, FX 4350.


Oh…
there seems to be a problem with my upload :frowning:
i’ll check this tomorrow an upload a new build.

Sdlvx: unfortunately the Linux source tree if blender seems to be much better supported by the blender devs and even the 3rd party libs have a better support for Linux. I don’t think the OS is the limiting factor in this.
gcc 4.7 is newer than VC 2008 and generates faster binaries, even mingw builds faster binaries than VC 2008. With VC 2012 there is a new compiler that is able to do much better optimizations. But there are porting problems because ms made their new compiler more standards compliant and most of the multi platform code defines various non-standard workarounds for windows.
it will take a long time to sort these out an optimize the code for better performance.
i’ve been analyzing many lines of code in the last few weeks and found many things that could be done better and increase performance on all OSes.
For instance there are some 64bit portability issues that need to be fixed.
There are some strange problems with windows API calls that definitely need to be fixed.

imho the 32 bit support could be dropped on mid-term that would make things easier for the devs.
i am curious how many users still use a 32 bit build of blender?

i think I could talk about this for the next 3 hours but that would possibly lead to nothing.
I am not a blender dev, I just want to help and make things a bit better :wink:

I am not a real Blender dev either, so definitely ignore this if one of the core devs tell you something else…

In my experience, by far the best way to optimize code for speed is profiling (gprof for gcc, just compile your code with -p, run Blender and then run gprof). Just doing some common Blender tasks and uploading the profiles to the Blender wiki would already be of great value. With this, anyone can identify speed bottlenecks and try to fix them.

Tame: New builds are up and work fine now :wink:
I’m eagerly awaiting your results :smiley:

Ok I had some time to test with Michael Pans benchmark, these are my results:

System:

CPU: Intel Core i5 650 @ 3.20 GHz
Ram: 4GB
GPU: NVidia Quadro NVS 295

Tested CPU only for compiler comparison!
Redered with blend file default settings, just open and press F12.

Blender 56521 MinGW-64 from Buildbot: 08:37.46
Blender 56521 Windows64 from BuildBot: 13:24.50
Blender 56521 Win64 VC2012 own Build: 11:43.45

So it seems that the mingw64 build from buildbot is much faster than anything compiled with MSVC no matter which version.
But the performance increase is noteable when upgrading from VC2008 to VC2012.

I’ll run a test on my other machine later on, just to check what my Core i7 with AVX is able to do ^^

Cool.
I’ll try your builds later, since my CPU is crunching a pretty heavy fire simulation test atm…

Cool! Could you test the builds with this fire simulation too? It would be great to see what effect the compiler has on sims.

Another test:

System:

CPU: Intel Core i7 3820 @ 3.60 GHz
Ram: 16GB
GPU: NVidia GeForce GTX 660

Tested CPU only for compiler comparison!

Blender 56521 MinGW-64 from Buildbot: 02:25.82
Blender 56521 Windows64 from BuildBot: 04:05.11
Blender 56525 Win64 VC2012 own Build with AVX: 03:28.64
Blender 56525 Win64 VC2012 own Build without AVX: 03:33.79

I wonder why MinGW is so much faster …

Hi,

So I did some quick tests on my i7 3820 (@ 4,3GHz)

During the tests I had Firefox (youtube) running on the background, but it shouldn’t make any difference.

[TABLE=“class: grid, width: 800, align: left”]

Version

Mike_Pan

BI_speedtest
Cycles_speedtest
Smoke_speedtest

2.67 r56533 (release) 64bit

03:26.06
00:59.82
00:56.91
63 sec

2.67 r56533 vc11 with avx 64bit

00:56.73
00:49.90
63 sec

6.66.6 r56521 mingw64 (buildbot)
02:02.09
00:57.75
00:33.29
187 sec

[/TABLE]

Mike_Pan: default settings, but tile size set to 64x64 for all tests. Unfortunately, for some reason, your build always crashed on me during ray tree building on this scene (loading image textures).

BI_speedtest: A simple & random scene rendered on BI

Cycles_speedtest: A simple & random scene rendered on Cycles (since Mike_Pan didn’t work for your build)

Smoke_speedtest: A simple & random smoke baking job

MinGW wins when it comes to rendering speed, on the smoke simulation it looks like the build is not capable of OpenMP, CPU usage was low as well.

I’m too lazy to test other simulations, but I don’t think there will be difference…

Thanks for testing. The crash is strange though…
mike_pan worked fine on my machine.
maybe it’s the AVX extensions. They do not really speed up anything.
openmp works fine with the build otherwise the smoke sim would be much slower :wink:
it seems that cycles code base is optimized in strange ways on MSVC.

Yeah OpenMP on your build is working fine, I was referring to the mingw64 build which took 187 sec to complete the smoke test :slight_smile:

Anyway the mingw is really fast in Cycles rendering… The performance has also gone back up on par to that of 2.63 mingw build which used to be the fastest…

I don’t know whether there are yet any additional optimizations that could be done to the mingw build. If that would be possible, then it would start to come close to the performance of a slow gpu on my cpu :stuck_out_tongue:

The curse of multi platform development :wink: You’ll never get the same results…
I found a crashing bug in OSL. You should disable it for now if you use my build.
I am working on a solution. This time it’s not Blender to blame, seems to be a problem with OSL and LLVM.

Alright, all fixed and new builds uploaded :smiley:

According to Brecht Cycles is primarily developed with GCC/LLVM, thats why the MinGW build is much faster than MSVC builds.
I’ll try some profiling on this and see if there is something I can do about the speed :wink:

I can’t say too much because of an NDA, but I have been testing current builds, and my own, on my own machine. This is just 2.67’s Cycles running on three different partitions. I have a very complex scene because I am trying to get a bunch of the common things in the benchmark.

My CPU specs:

2.2 GHz quad core i7
8 GB 1333 MHz DDR3 RAM

Results:

Official build, OS X Lion: 1 hour, 12 minutes, 56 seconds
Official build, OS X Mountain Lion: 1 hour, 26 minutes, 44 seconds
Official build, OS X Mavericks: 54 minutes, 3 seconds

Built from Xcode 4, LLVM, running OS X Lion: 1 hour, 3 minutes, 27 seconds
Built from Xcode 4, LLVM, running OS X Mountain Lion: 1 hour, 18 minutes, 57 seconds
Built from Xcode 4, LLVM, running OS X Mavericks: 44 minutes, 19 seconds

Built from Xcode 5, LLVM, running OS X Lion: 55 minutes, 23 seconds
Built from Xcode 5, LLVM, running OS X Mountain Lion: 1 hour, 14 minutes, 16 seconds
Built from Xcode 5, LLVM, running OS X Mavericks: 36 minutes, 54 seconds

With so much variation just coming from the OS, and with so much more from the compiler, I think Blender could be sped up significantly if the Blender team compiled the OS X versions on OS X, the Windows versions on Windows, et cetera. Just compiling my own version, I cut the render time to 68% of that of the official build with the new OS X. With that and a few small optimizations, we are talking about doubling render speeds.

Xcode_t, blender is compiled already that way: linux is compiled on linux, osx is compiled on osx, windows is compiled on a windows machine.
Can you post your time for the Pabellon Barcelona scene? I am curious, my guess is Xcode5 is just copying the GCC code.

On Mavericks with the version of Blender I compiled with Xcode 5, I rendered the Pabellon Barcelona scene in 6 minutes 37 seconds

Okay, I am waiting for the previous comment to be approved, so I cannot edit it, but I have a major correction to make. It was 62 minutes, not 6 minutes. I guess I didn’t press the 2 hard enough.