Self compiled blender: -O3 not stable?!

Update: Solved, it seems I used the wrong MinGW package. I updated to the latest ‘candidate’ release and the compilation process went through smoothly.

-O3 is fairly known for breaking software, so it’s no surprise to me, that Blender doesn’t work. Also -O3 generates much larger binaries than -O2, so the speed increase is only theoretical, since much larger binary runs slower :).

with blender up to 2.40 (even tuhopuu 3) I had extremely stable builds (gcc 3.3), with -O3, p3, sse, unrolled loops etc, old scons system. I have posted my tested and proven flags in intel optimized thread. Speed increase was like from ~ 21min to 15min on PIII (no sse2).

actually further more tests shows that as soon as i add march=athlon-xp to the config file, the binary breaks when rendering. -O3 does improve the speed in this case and doesn’t break the binary.

So it’s not O3… Will try march=i686 now.

barton or thoroughbred?

athlons may want something like this:
-march=athlon-xp -pipe -O3 -fomit-frame-pointer -ffast-math
-fprefetch-loop-arrays -funroll-all-loops -fforce-addr -mmmx -msse -m3dnow
-mfpmath=sse,387 -falign-functions=64

(athlons have 64+64k L1 cache)
-funroll-all-loops was faster for me on pentium (shouldn’t be according to gcc manual), -funroll-loops is turned on with -O anyway.

you can try -march=i686 -mtune=athlon… ? I didn’t, but you never know?

tedi, i have a barton core 3200+.

also, i thought mtune was a deprecated way of saying march, plus it just generates extra code for compatibility w/ other processors?

edit: just tried
-march=athlon -pipe -O3 -fomit-frame-pointer -ffast-math
-fprefetch-loop-arrays -funroll-all-loops -fforce-addr -mmmx -msse -m3dnow
-mfpmath=sse,387 -falign-functions=64

so far, the only thing that makes a stable binary for me is:
‘-O3’, ‘-march=athlon’

On the Mac version, some of those optimizations were breaking the app but in the sense that when I rendered something, there would be all streaky lines over the render. I narrowed it down to the build options -ffast-math and I think a line that had some -nopic term in it. I can’t remember outright but it made the binary work fine on -O3. It does increase the size a bit and speed increase was negligible for me but then the PPC compilers aren’t very good. PPC only gets good when you use Altivec programming.

actually my build with -ffast-math has more precision that the RC3 build for some reason. If you look on my site, you’ll see i have a picture comparison showing the ‘line problem’. I believe someone once told me ffast-math is actually faster AND more accurate but it just isn’t the standard, that’s why it’s not enabled by default.

sse is 64bit, sse2 is 128. afaik (correct me if I’m wrong), on cpu with sse you can’t have “excessive” precision. on cpu with sse2 and sse3, you can.

then again, if it looks good, then it is good.