7970 opengl Blender benchmarks and misc Blender opengl findings (warning! LONG read!)

After upgrading my old 5870 to a shiny brand new 7970 last week, I decided to put the new card through its paces. The problem with all the consumer-brand video cards is: no-one really bothers to test these in 3d apps, and the general opengl performance. (though reviews like http://tinyurl.com/6wpy7ot can be helpful)

This has cost me money and time when I bought a Nvidia 480GTX about 1 1/2 year ago, and, much to my dismay, discovered that the 4xx consumer line of cards (as well as the newer 5xx cards) from Nvidia were severly crippled in their opengl performance. I was running a 280gtx before that, and the 480gtx performance was literally on par with a two generations older 9800gtx - four times slower than the 280.

edit [on Nvidia cards performance can be much improved by turning off double-faced polys:
Zalamander: In order to restore performance, you must disable double sided lighting for every object. (object data -> double sided; you must be in blender render mode to access this setting)]

It made me RMA the 480, and get an XFX ATI 5870 1gb. This proved to run extremely well in Blenders opengl viewport, though selection lag of objects at semi-higher poly counts became unbearable to me - with lags up to 30 seconds or more on 2 or three million poly scenes. This is due to the deprecated gl_select method used in Blender, which is not supported through hardware in Ati drivers.

To cut a long story short: at my breaking point of almost deciding to switch to another app, Psy-Fi developed an occlusion-based selection patch to save the day! Gotta love the dev community. I am now using Bat3a’s optimized build, and it’s been Blender opengl performance heaven for me.

Read up on it here: http://tinyurl.com/7feccp7

Now for the benchmarking. Truth be told, I did a little more than just some simple benchmarks, and decided to compare Blender’s viewport performance in different benchmark scenes. I also included a production model from Project London (thanks to the Project London guys for the share!) Suffice to say, I made some interesting observations - and some combinations of how things are setup in a scene can completely kill your viewport performance (no matter what type or brand of graphics card you are using!).

These issues are problematic in a production environment, and, in my opinion, should be addressed. Snazzy new features are awesome, but if some of the basic functionality in its current state breaks the production workflow, I feel it is time to bring them to the attention of the devs and the community.

Here’s my system specs:
Asus p6t Deluxe SAS v1 motherboard
Intel i7 920 overclocked @ 3.6ghz
48gb ram (6x8gb Ripjaws X)
Revodrive x2 240gb system drive (up to 1.5gb/sec transfer rate)
Windows 7 Professional 64 bit

5870: latest 12.1 catalyst drivers
7970: 8.921.2 RC11 AMD Radeon™ HD 7900 drivers

The spreadsheet with all the benchmark results can be viewed at:
http://tinyurl.com/6ttgkvp

So, let’s have a (furry) ball!

Introduction to the new AMD 7970

First some photos:

http://www.estructor.biz/pics/package.jpghttp://www.estructor.biz/benchmarks/pics/package.JPG

Simple packaging (how I like it - enough waste as it is).

http://www.estructor.biz/benchmarks/pics/card.jpghttp://www.estructor.biz/benchmarks/pics/card.JPG

The video card itself.

http://www.estructor.biz/benchmarks/pics/board.JPG

And after installing it in my case. It’s a long card, and I had to reposition one of the hard drives to make it fit.

Generic OpenGL Benchmarks

Cinebench
A good standard opengl performance benchmark. I expected higher frame rates, since the 7970 is (on paper) way faster than the 5870. Much to my surprise, I got almost identical results:

http://www.estructor.biz/benchmarks/graphs/cinebench.png

I am not too sure what is going on here. It might be due to the rather ‘virgin’ state of the 7970 drivers; these are not yet integrated in the main 12.1 drivers, and during the benchmarks, I felt like the opengl drivers were almost an afterthought by AMD. Perhaps they have decided to cripple the opengl drivers as well. Or the test is cpu-limited.

Notwithstanding the current state of the drivers, I really expected better opengl performance in this test, and this was a major disappointment.

Unigine Heaven
I tested both cards in opengl mode with AA disabled at 1920x1200. I also tested DirectX11 with 4xAA at 1920x1200. Both Full screen.

http://www.estructor.biz/benchmarks/graphs/unigine.png

A marked difference. About twice as fast, which is what I expected. Still, the beta state of the 7970 driver is also visible in this benchmark. With better opengl drivers this should be higher - the hardware can do better. Notice the DX11 performance at 4xAA - on par with opengl without any AA.

Tessmark
For good measure I tested the tesselation performance (no AA at 1920x1200).

http://www.estructor.biz/benchmarks/graphs/tessmark.png

Again, excellent results for the 7970, especially at higher polygon levels.

MSI Kombustor
An interesting all-round opengl test, comprised of several benchmarks.

http://www.estructor.biz/benchmarks/graphs/kombustor.png

…and a total fail. This benchmark truly demonstrates the undeveloped state of the opengl drivers: performance is either underwhelming (wavy plane and tessy spheres), or the benchmark will not run. The Kmark Extreme benchmark could not initialize shaders, and refused to start. The tessy spheres benchmark stuttered like crazy.

Not good. Not good at all. AMD needs to work on the opengl drivers.

Geeks3D_OpenGL_Instancing
A opengl benchmark to test hardware-based opengl object instancing.

http://www.estructor.biz/benchmarks/graphs/geeks3d.png
Results for the 5870.

I did not include these in the final benchmark spreadsheet, because testing this on the 7970 caused a complete system-wide crash (or rather: screen-freeze). It leads me to think that opengl instancing is still not supported very well in the current 7970 drivers. Again a FAIL for the 7970.

OpenCL performance
Naturally I was very interested in the opencl performance of my new 7970. I use Luxrender quite a lot, and Luxrender support hybrid cpu/gpu opencl rendering. To benchmark the opencl performance, I used Luxmark v2.0.

http://www.estructor.biz/benchmarks/graphs/luxmark.png

Nice, but again I expected much better results seeing the hardware specs of the 7970. Although an almost twice as fast unbiased render time is nothing to balk about.

ps: notice the differences between the older 11.8 and newest 12.1 drivers.

/* end of part 1*/

Onwards to the nitty gritty of these benchmarks: Blender.

Except for the Orb benchmark scene (which rotates the object) I created a camera in all scenes that orbits 360 degrees around the object using a constraint with the camera fixed to a circle curve.

In the tests I tested wireframe, solid and textured viewport modes, and each with and without the object as selected:
-s (nothing selected); +s (object selected)

All tests were performed with VBO’s on, GLSL on, and in object mode. I was solely interested in Blender’s pure viewport opengl performance.

For the Blender benchmark scenes I only include graphs - to view the actual numbers, please download the spreadsheet: http://tinyurl.com/6ttgkvp

Blender Millenium Falcon test

http://www.estructor.biz/benchmarks/pics/falcon.png

This model is courtesy of Andy Crook and Scifi3d.com. There were errors in conversion, which I did not correct. I thought it would be useful to include a model that is partly “broken”.

http://www.estructor.biz/benchmarks/graphs/falcon.png

So, some conclusions:

First, the 7970 driver caps the maximum framerate at about 130fps. The 5870 does not. I noticed this in some of the other benchmarks. Do not read this as if the 5870 performs better than the 7970. The driver prevents ridiculous framerates to reduce power and heat consumption.

Secondly, although the performance in sold and wire view mode is outstanding, textured mode (GLSL) is absolutely dismal. There is no real reason why the addition of a couple of textures and one light shading in the scene would cause the framerate to drop like this.

After testing the same scene on a very well equipped Asus laptop (nvidia560m) last Saturday during the Vancouver Blender meetup, the Nvidia results were equally confounding: solid mode about 10fps(!), textured mode about 29fps! The crippled Nvidia drivers were obviously at work in this case. Still, even WITH the crippled drivers textured mode on Nvidia cards offers superior performance in Blender compared to AMD/ATI cards.

Honestly, I feel this is due to the way Blender’s opengl is implemented. The game engine is unaffected (which is also opengl based), and games developed in Blender run absolutely smooth with both the 5870 and the 7970.

This should be addressed - textured mode is unusable as it is for ATI/AMD cards. We will re-address this issue when benchmarking the “Goose” model.

Furry Ball

http://www.estructor.biz/benchmarks/pics/furryball.png

This object courtesy of Cornell University
http://graphics.cs.williams.edu/data/meshes.xml

This is a pretty hefty object at almost 3 million faces. The complex structure creates other complexities that 3d apps must cater for. Here are the results:

http://www.estructor.biz/benchmarks/graphs/furrball.png

Finally the 7970 proves it is faster at handling opengl than the older 5870. A full 20fps faster in solid view mode with no selections. Still, underwhelming - the unigine test proved it should be (theoretically) twice as fast. Though at this point I feel we are actually hitting Blender’s limits (and my machines cpu limits), rather than the 7970 limits.

…however, check the incredible drop in performance when the furry ball object is selected in the viewport when working in solid view mode! A drop of 50%!

Here we encounter another issue with Blender’s opengl viewport implementation. SeanJM offered a very plausible explanation last Saturday when I showed this drop in performance during the Vancouver Blender meetup: it seems Blender is calculating the view/data twice internally.

With structurally simpler objects the drop is not as noticable, but with objects like these, it is quite bad. Note that in almost all the benchmark scenes a drop in viewport performance is visible/noticeable - except when working in wireframe mode.

SeanJM also mentioned the scheduling system in Blender is planned to undergo a long overdue overhaul. Let’s hope this selection issue will be addressed as well in the future.

Texture mode performance was not as bad as expected - understandable with no textures applied.

The Orb

http://www.estructor.biz/benchmarks/pics/orb.png
This scene was created by metalliandy to do some nvidia testing.
http://tinyurl.com/8xclnjy

I introduced two additional levels of subdivision. In these first tests I benchmark at three different levels with the sub-d modifier applied in all cases. Since I had access to hp z210 workstations at work, I decided to run this one on a Quadro 2000 as well, for some interesting comparisons. Note that the hp z210 workstation was unable to cope with the third level - it ran out of memory.

http://www.estructor.biz/benchmarks/graphs/orb15870.png

http://www.estructor.biz/benchmarks/graphs/orb12000.png

http://www.estructor.biz/benchmarks/graphs/orb17970.png

Again, the 7970 is capped at ~130fps, until I selected it at level 1 in solid mode. This seems to be a glitch in the 7970 opengl drivers. The Quadro 2000 driver capped at 60fps, in synch with the screen.

Both ATI/AMD cards obliterate the Quadro 2000 in solid mode. In textured mode once again the Achilles heel of AMD/ATI is shown. Again, I am uncertain whether this is a driver issue, or an issue with how opengl is implemented in Blender.

Also note the drop in performance when the orb object is selected in sold and textured mode - no matter which card. Blender’s method of implementation of selection seems to be the cause.

As for the Quadro 2000: selection lag was quite noticeable at the second and third level - though instant with the occlusion selection patch active. Even on a professional grade workstation video card the gl_select method used in Blender for selection proved to be laggy.

It also took this card about half a second to ‘get up to speed’. On the other hand, the 2000 handled textured mode better than the amd/ati cards (which I expected).

The Orb - Subdivision modifier troubles

http://www.estructor.biz/benchmarks/graphs/orb_subd_7970.png

Well, I guess the graph speaks for itself: this is a major issue in Blender. No matter the video card (I tested this on other machines, including the Quadro 2000, a 480gtx, a 560m, 5870, etc.)

adding a sub-d modifier to any polygon object destroys Blenders opengl viewport performance

I have noticed some people writing here that character animations play slow - I noticed this as well, and after some investigation, I came to the conclusion that:
if you want smooth(er) animations, turn off any sub-d modifier.

Even a simple low-poly object with a sub-d modifier enabled can completely crush your viewport performance.

This one makes no sense to me at all - after all, why would it have to re-calculate the data for each frame of the viewport? Of the issues mentioned, this one is a potential production workflow killer, if you ask me. And others agreed with me on this point at the last Vancouver meetup. I work in other 3d apps with sub-d modifiers, and the viewport is hardly affected at all in those.

This must be addressed! Please!

EDIT Zalamander suggested this workaround method: simply add a deform modifier at the bottom of the modifier stack to force VBO based opengl rendering. Not a real fix, but quite workable.

Project London Goose

http://www.estructor.biz/benchmarks/pics/goose.png
Model courtesy of Dolf Veenvliet

A great production model to benchmark in Blender. And again, some surprising results. This model is relatively low-poly: about 300.000 faces. Looking at the previous benchmark scenes, one would expect great viewport performance - after all, not a lot of geometry to deal with, is there?

http://www.estructor.biz/benchmarks/graphs/goose_mods_applied.png

Wrong. It seems that all the rigging and setup of this model is causing Blender to slow down quite a bit. In this test I decided to test with the armature controls turned off and on as well. So +b means the controls are displayed in the viewport.

The 7970 actually performs worse than the older 5870. Perhaps this is due to the beta 7970 drivers - not too sure. I hope so.

Again, textured mode performance is quite horrible - looks nice though. But 10fps points at a bad opengl implementation in Blender once more (my opinion).

Selection causes a big drop in performance in any viewport mode now - including wireframe.

And showing a couple of controls has an additional negative impact on the viewport performance. Even turning off the relationship lines in the display settings will increase the fps.
For myself, I have noticed before that polygonal control objects have less or almost no impact on the fps compared to these ‘flat’ 2d wire objects.

Let’s have a cursory glance at the original model with all modifiers intact:

http://www.estructor.biz/benchmarks/graphs/goose_mods_not%20applied.png

Right from the start we see a drop in viewport performance. Reason: a couple of sub-d modifiers (about 3 or 4?).

Viewport performance is still horrible compared to the other benchmark scenes with much higher poly counts.

Conclusion must be that Blender’s scheduling or other internal workings are affecting the viewport’s performance rather badly. This is not an overly complex model - and this behaviour can affect production workflow in a negative manner.

Dragon

http://www.estructor.biz/benchmarks/pics/dragon.png

No real benchmark, just to prove a point. On my 7970 this 40 million poly scene still orbits quite admirably - there is lag, but I guess it is doing about 12fps.

Please explain to me why this ridiculous number of polys is still workable, while a simple 270.000 poly production model (the Goose) forces Blender’s viewport to its knees.

Final benchmark: the big scene

http://www.estructor.biz/benchmarks/pics/Bigscene.pnghttp://www.estructor.biz/benchmarks/pics/BigScene.png

Potpourri of several production quality models: the Goose, the tower (Pauls Spooner), giant ufo (Nizu), Deichmann ufo (http://tinyurl.com/83s7tju)
Partly animated, and consisting of 10328086 vertices and 9853584 faces.

http://www.estructor.biz/benchmarks/graphs/greatscene.png

This time I included bounding box mode as well. Performance is quite bad - again Blender is probably calculating all sorts of things, which gets in the way of viewport performance. It is quite telling when we examine the bounding box mode framerates. The view is just required to show boxes!

In both Lightwave and Cinema4d this does not happen. When a scene becomes to complex to orbit without lag, Lightwave’s layout will automatically switch to bounding box mode, and this results in a completely smooth viewport experience. So does Cinema4d, if required.

Scheduling system? I just feel this does not make sense. If no changes are made to models, it should not affect the viewport performance. And it does not in other apps (or hardly so) - but it does in Blender.

Final conclusions

7970

  • 7970 works quite well with complex scenes in blender, but it depends on the type of scene. The dragon works well, the great scene does not.

  • some results, compared to the 5850 are erratic.

  • 7970 opengl drivers are still currently beta, and not optimized at all.

  • opencl performance is quite good.

  • 7970 runs silent and requires less power

  • raw opengl performance is at least twice that of the 5870, as demonstrated in the unigine Heaven benchmark. Even with the current beta state of the opengl drivers.

  • some benchmark tools just refuse to work on the 7970 - again, opengl drivers need more work

  • the 7970 offers a lot of overclocking potential (based on other reviews and reports on the web), so opengl performance can be improved this way

  • the 7970 and 5870 performance in Blender do not differ too much in practice: except for scenes with high poly static objects (like dragon and furryball) both cards hit Blender’s processing ceiling. The internal processing that occurs in Blender limit the usefulness of a powerful card such as the 7970. Until that is resolved (or cpu’s become 5 times as fast) it makes no sense to invest in a high-power card, whether a high-end Quadro, or the 7970. At least, that is my personal opinion.

In a nutshell, the 7970 is held back by Blender, rather than the other way around! And likewise for the 5870. This was my most surprising realization.

Blender

  • internal workings of Blender have a huge negative impact on viewport performance. Even relatively low-poly production scenes (the Goose) are severely affected. Rigs of characters also affect the viewport in this manner. This is video card brand independent.

  • some add-ons hit the viewport performance bad: for example, the enhanced 3d-cursor add-on, when opening the properties pane (n), reduces the viewport fps by at least 20fps or more.

  • complex static objects can run at high frame rates in opengl solid view mode on ati cards (dragon scene)

  • gl_select is slow - on any card. On ati cards the occlusion patch is essential. It is unworkable without on semi-complex scenes with ati/amd cards. On complex scenes selection might take 20 seconds or more, during which Blender locks.

  • the sub-d modifier crushes viewport performance if one is not careful. This is video card brand independent.

  • viewport performance (especially solid and textured modes) in Blender is negatively affected by selected objects. In some cases (furry ball) a huge drop in fps results. This is video card brand independent.

  • viewport performance is negatively affected by armature controls. This is video card brand independent.

  • unlike other 3d applications, internal workings of Blender devastate even bounding box viewport mode performance in complex production scenes.

  • solid mode on consumer 4xx/5xx Nvidia cards is severely compromised in performance due to crippled drivers. A work-around solution for more complex object (like the falcon) is to work in textured mode, which may result in better viewport performance.

  • Textured mode performance with textured objects is dismal with Blender running on ati/amd cards. Nor is it very good on Nvidia cards. The game engine runs smooth in opengl, with better viewport quality as well. I see no reason why textured opengl mode in Blender could not be improved using similar techniques.

Thanks for reading! Comments are welcomed. The benchmark scenes I can put up, depending on whether the authors of the original models permit.

Lol, when I asked you to post your experience with the 7970 I didn’t knew you’d venture to write a book :smiley:

Thanks for the great efford :slight_smile:

Uhm, yeah, I know - wasn’t my intention. But I really feel this is one area that has been neglected by the Blender community and developers, while other commercial 3d apps have added all sorts of snazzy viewport eye candy. Blender’s opengl viewport tech is rather limited in comparison.

I just hope this thread will light some fire on someone’s “behind” parts. Without a well performing viewport (most basic usability requirement) how can one hope to improve Blender in the long run?

Brilliant job on this; let’s hope it has some positive affect with the devs.

#6 +10 , agree. OP brilliant job :smiley:

Very nice Work. Hopefully we’ll see a viewport redesign soon.

Wow! This is a very appreciated thread! Thanks for the work. I do hope some devs see this and get motivated (maybe?).

  • the sub-d modifier crushes viewport performance if one is not careful. This is video card brand independent.

  • viewport performance (especially solid and textured modes) in Blender is negatively affected by selected objects. In some cases (furry ball) a huge drop in fps results. This is video card brand independent.

  • viewport performance is negatively affected by armature controls. This is video card brand independent.

Yep. I had discovered that a long time ago. Makes no sense to me. “Why does my 5000 poly character with an armature and a L2 Sub-D modifier kill Blender’s view port performance?!? Arghhh!!” O well.

Everything else. Very comprehensive… and very sobering to those who think Blender tramples all over the Big Boys in every way.

What a great post, I agree 100%.

…however, check the incredible drop in performance when the furry ball object is selected in the viewport when working in solid view mode! A drop of 50%!
That is because of the outlines, which requires the mesh to be drawn twice. You can disable the drawing of outlines.

adding a sub-d modifier to any polygon object destroys Blenders opengl viewport performance
If you have a subdivision modifier at the bottom of the stack, it will make it enter its own weird codepath that is interwoven with the subD code, which is all immediate mode drawing, not VBO. If you have a dummy modifier such as the simpledeform at the bottom, it will calculate and use VBO instead.

Btw, I’m not surprised that the new AMD generation isn’t necessarily faster than the previous generation(s) for something like solid mode drawing. Nowadays all the work goes into improving the shader processors, so while 7970 has significantly higher shader performance (as evidenced by Unigine) the fillrate is almost identical to that of a 5870. This just reflects what is relevant for game performance.

I hope the current problems you’re having are only caused by early drivers… as the drivers mature so will the performance right?
I’m eyeing for a HIS 7950 3GB thank goodness I saw this thread… I’ll wait for your further testings.
as a noob in all things blender and open gl I can’t say much in any topic discussing these “crippled products” I could only observe,read,learn. thanks for taking the trouble to write.

We are aware and working on it :slight_smile:

http://lists.blender.org/pipermail/bf-committers/2011-April/031593.html

When discussed on irc, at that time it was more or less agreed to wait for bmesh since VBO setup will need to take the new system into account. I haven’t checked the bmesh code that uploads to GPU still, there may be some changes in performance for better or worse.

Personally I would like to see (implement?) an on-demand, stream based model for OGL drawing with unified API to only update VBOs when necessary, taking material/uv island information into account .A lot of calculations are done on the fly though Campbell did a number of optimizations there for 2.61 I think. Also, I’ve heard(but not verified) that modifiers such as subsurf are computed every frame, even if unnecessary, making this more problematic. Solving this also requires a better dependency graph.

The dragon scene is a bad example IMO because the VBO has to be built just once, so it’s quite logical that performance will be better.
In other scenes, objects will have to build their own VBO at the time of drawing so your scene becomes CPU, setup and draw call bound.

That would be a very nice Gsoc project if you’re still a student :wink:

thanks a lot for all the work you put into this review:d very good read and some startling results:O

hahaha, a very good topic again :D:D:D

I use Blender for level editor with big levels and I have no problem with performance

but of course Blender is unusable :D:D:D

To clarify: the 7970 can work better than the 5870 in Blender, but often both cards are hitting a ceiling caused by Blender. Check the Heaven Benchmark to see that the 7970 is actually very fast. I do not think the drivers are crippled like Nvidia’s for opengl.
However, it seems texture mode is handled by software emulation, rather than hardware based atm for both cards in Blender.

In this case the hardware is held back by the software.

Also, the 3gb video ram does make a difference for more heavy scenes (unless, again, limited by Blender internal processing). The 7970 runs silent and much cooler than the previous gen - which is great. With more heavy scenes the 5870 would start to become a hair blower.

Yes, I knew the dragon scene is a bad example, which is why I did not really include it as a benchmark scene, but sort of as an example that the pure opengl performance of the card is quite good. And thank you for the explanation why that scene works so well: vbo’s only calculated once.

Your ideas sound excellent - I think softimage might be using a similar approach to things (seeing its viewport is one of the most optimized).

Yes, but how am I going to distinguish between selected and non-selected objects in a scene then? Not really an option.

THANK YOU! That is a quite workable workaround for now, and an excellent suggestion. This tidbit of info should be part of the sub-d modifier documentation. (anyone wants to add this to the wiki?)

True, but it can be much faster, depending on the scene (more video ram) - but at this point the software is holding back the hardware. And especially the textured view could benefit from the higher shader performance - which at this point is pretty much useless in Blender on ati/amd cards for certain scene such as the goose and falcon. Though the game engine is not.

:smiley:

Hi Endre! I just love your work on Dead Cyborg (I was actually one of the few people who did donate to your project).

When I test your blend file in textured mode the performance is actually quite good:

But still a bit laggy on my 7970 (12fps) and 5870 even though this level is only comprised of a mere 91k faces, while the game mode is absolutely smooth, and the quality of the shaders obviously much, much better.

solid view mode is absolutely smooth, of course, even though performance is also affected by the internal processing of Blender.

Textured mode on ati/amd cards seems only usable when working with low poly scenes like these. At higher poly rates its viewport performance plummets to the floor.

I think textured mode will work more smooth on a nvidia card for this project as well. Speaking of which: are you using an nvidia card?