True opengl instancing as in Lightwave 11 new feature? Improved OpenGL architecture?

Check out the new Lighwave 11 feature list:

I love that instancing - and after watching the LW 11 video event that showcases some of that instancing for massive crowds (late, way at the end in the video), and loading up a ‘mere’ 1 billion polys in the viewport without any memory issues, I wonder when this sort of true instancing is going to become available in Blender.

Ton mentioned they are going to implement a dependency graph, but will this allow the opengl view to operate with virtual clones as well?

And talking about the opengl view: does anyone know whether the devs intend to improve the performance at some point? I’m not about to spend $2000 - $4000 on a Quadro for better viewport navigation. The lackluster performance when I animate reasonably simple characters is getting in the way of focusing on animation.

I mean, softimage is doing a great job with their Gigacore/Gigapolygon architecture, and since projects done in Blender are getting more complex and involve large data sets, my experience is that Blender’s OpenGL viewport is just… struggling. Cycles is awesome, and all, but the grunt of the work is done in the viewport! And sculpt mode is able to handle much larger poly counts than the object mode (let’s forget about edit mode, which is ( understandably) even slower) - so somehow optimization should be possible.

These issues do stand in the way of larger-scale productions, or more elaborate effects, and must be addressed some time in the (near) future.

Thoughts, anyone?

In my experiance Blender can handle alot and thats using 2.4X. I had a huge scene that had alot of objects that shared the same mesh data block and it worked well. 3.5mill faces. I know thats not a billion but its still alot.

I also think that with good scene managment you can handle any scene with proxy objects. Blender works really well with Proxy objects. Just create the proxy and perant the real object to it, then hide the real object in the render layer. Then have a single layer with all the proxies on it and you can do all your work with the proxies.

With efficient use of proxies and playblasts you can create very complex scenes by simply working on a single element at a time, and then assigning a proxy to it.

Also setting up your Subsurf modifiers properly helps alot. You can link the subsurf level to the camera distance, making for much better and faster proformance and render.

Saying all that, better viewport proformance would be great to have. Although its not a deal breaker, I think its curently “good enough”.

McHammond why do you use 2.4?

Just mentioning, if we had more than 20 (God damn twenty!!) layers we’d not really have many problems with large scenes as having heavy objects spread out over layers with proxies spread out for easy management, then, yes it’d be rather good.

Considering that these 20 layers are also shared between render layers, and in some scenes I’ve max’d these out getting out passes etcetera.

OpenGL instancing is old technology: Comes from the 2008, and is openGL 2.1 spec. But only nowadays videocards have good enough power to support it, and Radeon Cards are pretty problematic with it (Radeon 4xxx suports only software instancing, Radeon 5xxx and up supports hardware instancing but is slow as heck: Nvidia’s can run 6 times per call the maximum instances number posible on a radeon).

Although Blender can benefit from it, the coders probably can’t do much about it, since Blender is multiplatform, and although openGL is an standard, depending on the OS used, and hardware used, openGL is treated pretty much as “Results may vary” (take VBO. Is a feature developed in the days of openGL 1.5 and only recently was added to Blender, not without quite some problems). This is particularly true with Intel and AMD/ATI hardware, and more true with Linux nowadays.

YES!!! My main gripe with the layer system: a mere 20 layers is just not enough at all. I am working on a complex ship model, and ran out of layers fast. There are other ways to organize objects, but layering in Blender is quite limited.

Basic scene management could be much improved, though. In 2.49 I was using the layer manager plugin, which made more complex scene management quite easy. Simple example: WHY ON EARTH can’t I multiple select objects, and change the display for all of them in one go? On the other hand, I can change the subdivision for multiple objects in one go. It is inconsistent.

Why is there no proper layer manager? Named layers? Drag and drop in the outliner is only now being addressed, and the basic usability could be so easily improved no end. Check out the 2.49 layer manager plugin which showed the way forward, and made the layer system much more flexible, efficient and convenient. I wonder what happened to the developer?

And of course, I do use proxies myself, I like scenes, working with linked objects, etcetera. Very flexible, but I feel that on a basic level Blender’s direct object management is in dire need for some improvements.

I have alot of custom tools and techniques that are still 2.4x Also the newer version of blender has only recently become stable, and it still changes to often for me to use. 2.49 worked well and still does.

You are now talking about more than viewport speeds and polygon counts:P

It sounds to me like you should be using the built in asset managment features of blender. Blender can use any .Blend as a database for storing all your project assets as a libary. You can then link thouse assets in a single .Blend for rendering. Grouping objects on multiple layers in one .Blend will make them easy to handle on a single layer when you link them.

Alternativly you can use the scene feature to manage your project:
Create a new scene called “Ship” and use all 20 layers. Then group all objects on all layers in to a new group called “Ship”.
Then create a new empty scene called Ocean and add the group “Ship”.
The ship will then only use one layer on your ocean scene but can be edited in all 20 layers in the other scene.
You can use this technique to manage a project with as many layers as you need.

Alot of users dont know how to use the Libary functions or the scene feature, so I would recomend you look into them if you are having problems managing your scenes.


Working with library items I already do. And the grouping of objects in a linked blend file, as well as the proxies are very nice. Adding 80 multi-million poly objects is no problem at all, and renders quickly. I am not complaining about general scene management.

My pet peeves (first the workflow related ones):

  • working with complex scenes and/or complex objects often requires me to switch groups of objects to bounds or wireframe display, because of viewport performance issues. Only one object at a time can be switched. Sure, you can use grouped objects and link those in your scene, but I have to re-use the same object tenths, and sometimes hundreds of times (deadeyes in rigging just one example). One single re-used library item this way becomes completely unworkable in the default Blender workflow. Scenes will not help me in these cases at all. I want all the blocks easily switchable. Just imagine working on a 74-gun ship, and I want to switch all cannons to wireframes, or bounds display mode. 74 clicks for selection, 74 clicks for the display list, and 74 more clicks to set. Uurghh. Now imagine having to add hundreds of small identical library blocks. It drives a man mad - and that’s what I need to do. Linking does not help me: each cannon must be individually movable. That’s why I currently use a small script that sets the display for multiple selected objects. Still cumbersome, but much better than the maddening alternative.

  • as far as I know, I cannot apply (not talking about adding the modifier or changing subd level) or remove the sub-division modifier for multiple selected objects.

  • groups do not help me either: I would expect to be able to group objects, and change the object settings for the entire group. No go: just one in the group is changed. Stupid, stupid, stupid. Why have groups (yes, I do understand how they work in linked library files) if you cannot set such simple settings in one step? Beggars belief.

  • the outliner is pretty limited, especially in the usability area. For example, in Photoshop and After Effects I can drag over the small icons to turn those on or off. In Blender I am forced to single-click each separately. Not helpful at all.

That is why the layer manager add-on is sorely missed. I have been looking into re-developing one myself. Looks pretty easy - if only I can find the time. But if I want to continue my ship projects, I REQUIRE this (base) functionality. The small scripts I am using now are a stop-gap solution.

And the (serious) opengl performance issues I encounter on a daily basis (back on topic, sort of ;-):

  • selecting objects in the viewport in any scene with more than 500.000 polys becomes a nightmare, due to the bug that on my ati 5870 selecting takes a couple of seconds. On larger scenes this may lag for more than 10 seconds. One solution is to use the outliner, which is immediate. But still, I only experience this issue in Blender. I wonder what opengl method is used for raycasting to select objects in the viewport? And I still have not heard from other ATI users if they experience this problem. I’ve switched OS (vista to win7), and no change. Even the modded v8800 driver did not help (though I have read somewhere ATI fixed this in their newest firepro drivers. I would be most grateful to anyone using a firepro if they could tell me if this is indeed fixed. If yes, I will run out and get me a v7900)

  • adding a sub-division modifier causes the viewport to slow down a lot, at times orbiting grinds to a halt. A 1 million poly object with modifier: 1 or 2fps. Applying the modifier: smooth sailing! This I do not understand, and I have not seen similar performance problems in other apps.

  • when linking a sub-division modified object, I expect the same viewport performance as with a ‘merged down’ object. However, that is not the case. Performance is marginally improved, but still lagging to about 3fps. WHY???!!! I can deal with slow viewport performance when a modifier slows down the viewport when working with the actual object - just turn it off. But why would it still slow down the viewport that much in a linked scene? It’s a ‘static’ mesh at that point, and should not have ANY impact on the fps. Makes absolutely no sense to me.

  • viewport performance seems erratic at times: one scene with 80 million polys still orbits quite smooth, another with just 5-10 million lags. No modifiers. Could anyone explain to me how to optimize a scene for optimum opengl performance?

Which OpenGL method for instancing do you mention here stargazer? Can you share some resources please?

I guess many people have no idea that there are many GI methods (geometry instancing, not global illumination :smiley: )

My next guess is Stargeizer means simple instancing with CPU transformation, maybe with GPU transformation, which are indeed old, but slow and no real GI methods. :smiley: (see (1) and (2))

  1. Simple instancing, where the geometry is rendered from one source mesh and the transformation matrix is calculated for each instance on the CPU and OpenGL uses glDrawElements() for each instance. Most basic and most inefficient.

  2. Some instancing which is rather dodgy, where the geometry again is rendered from one source mesh, but this time the transformation matrix is calculated on the GPU and the data per instance is passed via a vec4. The rendering again happens with glDrawElements() per instance. Same bogus as above, but the transformation matrix is applied faster.

  3. The “nvidia way”. Again, one source geometry rendered for each instance. The transformation matrix is calculated on the GPU, but this time the data is passed per instance via persistent vertex attributes. (other vertex attributes are normals, color, UV etc.) There´s a whitepaper for it, and although it´s a “pseudo method” it´s very fast on nvidia hardware : >Whitepaper here<

  4. The “real instancing”, HW GI. One source geometry is rendered in batches of 64 instances per drawcall. Nvidia could to 400 per call, AMD can do 64 due to some internal limitations, so OpenGL decided 64 to be good as it works on all cards. The transformation matrix is applied via GPU and passes as uniform arrays of vec4. One for location, one for rotation. However, you require glDrawElementsInstancedARB(), which is in an extension from the ARB. However, you only need 1/64th of the draw calls for the same amount of instances you’d need for the other methods.

  5. Lastly I know the “uniform buffer method” - it´s the overkill method, still full HW GI, but done in batches of 1000 instances per draw call. AFAIK it only works on GTX480/HD5870 and newer, else it obviously would oppose the limitations of method (4). It´s the same old story, one source mesh, transformation on the GPU, but this time a new thing is used, the UBO, a Unified Buffer Object holding the transformation data. Again it needs an ARB extension (GL_ARB_uniform_buffer_object). So you need 1/1000th of the drawcalls for the same amount of instances as with the other methods.

So it´s easy to see, instancing != instancing. There is the good the bad and the ugly method :smiley:

OpenGL-wise there are various methods as arexma said:

  1. Load instance matrices as vertex attributes, utilizing the state machine architecture of OpenGL
  2. Instancing with dividers/VBO, where attributes get loaded from a vertex buffer on a per instance base(This was a directX 9 functionality that got ported to OpenGL) see
  3. Instancing with glDrawElementsInstanced (which is OpenGL 3) which actually generates a per-instance index that can be used to sample a texture buffer for per-instance data. see

I didn’t know about the limitation though. Do you have any reference for that? Can’t seem to find any reference to 64 in the extension specs.

I do not remember where I read it, it was either in the OpenGL forum, or on I am not too sure though about what I am saying, I speak OpenGL for a while now, but just recently started to look into GLSL, so don´t take my word for it :wink:
It may be obsolete in the new specs and for new cards anyways and more of a guideline than a specification or limit per se, you can always push the bounds.

However, for what I understand for method (4) I listed, you store the transformation in a GLSL uniform for glDrawElements() and there is a limitation to the amount of active uniforms:

Each shaderunit should be able to have a set value of available uniforms, so it seems it´s very dependent on the amount of shaders.
If you now instance a crowd as an example, to take instancing a bit further than static meshes, you’d like to shade it, maybe animate it with a per-vertex-deformation and cast shadows, resulting in even less available uniforms per shaderunit, so I guess the 64 is just an arbitrary number, where it works nice without fearing that the shader rejects uniforms because there´s no space left :slight_smile:
And I think it´s due to ATIs different approach to shaderunits which suggest this “limit” and I have no idea if it still is valid for recent cards.

But correct me if I am wrong. It´s not like it´s ridiculously easy to understand OpenGL or find good tutoring, up until today I am always thankful if I find someone to talk about OpenGL with, as most “going-to-make-a-game-kids” prefer DX. :smiley:

I can’t really pass anything done here at work, but i think that this…

…can be quite informational, and altough source code is not provided, in the page 3 the author explain the used methods.

It’s the same one as arexma quoted, thanks.

OK, now some explanation:

The 4) and 5) examples are IMHO done in a certain way that is not optimal. You can bypass the uniform buffer limitation of 64 uniform vec4 by using texture buffer objects( and using the per-instance index to access these(In the comments below someone proposes comparing between the two).
These are actually arrays but accessible through a texture unit in the vertex shader. From the spec:
“Buffer textures can be substantially larger than equivalent one-dimensional textures; the maximum texture size supported for buffer textures in the initial implementation of this extension is 2^27 texels, versus 2^13 (8192) texels for otherwise equivalent one-dimensional textures.”

At least that’s the theory…Now I will need to write that instancing demo someday to prove it :slight_smile:

I hate it when that happens :smiley:
Last time I “tried” something in OpenGL I had to force stop myself because I was on the best way to write a graphics engine lol. And maybe a class for the lights, and I need this and that and… omg!

But for GLSL/OpenGL prototyping I recommend the free version of Geexlab from it´s really sweet and the last version allows you to use the full shader pipeline as well:

That is if you don´t already know it/use it :wink:

Nice, I wasn’t aware of that. I will go for my own demos though, there’s no substitute for that as a learning experience.

I have managed to get my first triangle rendering in OpenGL 4 this fall and indeed the amount of setup required for OpenGL now is too much. But once you have the basic framework setup, it’s easy to add some minimal stuff like a glDrawElementsIndexed call etc. Then it’s a matter of writing the shaders, which is another whole chapter. Really, this is for brave people :slight_smile:

My difficulties were more theoretical such as reading through the specs and trying to find out how feature X works. I dearly miss a good book on modern OpenGL like the “red book” once used to be.

You’re right, nothing beats the learning experience - it´s one of the best feelings when you come closer to your goal one codeline at a time.

I usually use a combination of OpenGL/GLUT(still the old 3.7)/SDL to quickly get an OpenGL surface going for whatever:D
Do you use GLUT or a free alternative or none at all and write everything yourself? :smiley:

Dunnow if you saw my post about shader prototyping, I can also recommend:
and even better, just released:

Latter doesn´t require compiling of the shaders, it runs in realtime and in the Browser with WebGL. It´s nice if you got a cellphone with WebGL and feel the sudden urge to try some shader in a doctors waiting room :stuck_out_tongue:

Usually I prefer to go as low level as i can get so for windows I use the win32 API and for linux Xlib. I know it’s asking for trouble but it actually works :). And so i am using this code as base(I can share it if you like though in terms of IO it is a bit lacking). Instancing can be a challenge because it is also dependent on the vertex data setup so I expect that no ready-made example will be suitable for such test demos without writing the code that sets the texture/vertex buffer object. The only library that I feel has become necessary is GLEW though I hardly use it in my projects.

Back on topic. As far as blender instancing is concerned, I hope someone can optimize it if it’s not good enough. The thing is, any code is a bit…resistant to new features if it was not written with them in mind. So for instance incorporating a good instancing algorithm will most likely be a challenge. Not impossible though.

Thanks for the links, may play with them later :smiley:

If you really do want to support 3 billion objects in the viewport I think we need to extend the name length of objects from 21 characters (this is what we had in 2.49) to something much longer like 256 characters per name. I was so disappointed that when we went from 2.49 to 2.5 that this value was not increased. Much of my python scripting involves managing object and datablock names. With a 21 character limit I quickly end up with objects named with a pre-fix, short identifierand a post fix number. Not very descriptive.

I’m not sure if it’s what you’re talking about but on this page under Blender 2.62 targets at the top it has “Longer names for IDs, bones, vgroups” under UV tools and Game Engine Polish.