Fastest way to make a 3dim numpy array?

I’m generating a 3dim numpy array from a bmesh containing the vertex coords of each vertex’s link edges’ other verts.
I’m relatively new to numpy and I’m wondering if there is a much faster way than what I’m doing.
I need to at least fill the array with new coords every frame so it needs to be fast.
Are there any options using ‘foreach_get()’ other than generating a flat array? Perhaps there is a way to create a zero list with numpy.zeros then fill it with the vertex coords? I can generate a list of indices just once then fill it each frame if there is a fast way to do that.

Thanks for the help in advance.

Here’s what I have:

link_co = np.array([[[ve.co for ve in ed.verts if ve.index != v.index] for ed in v.link_edges] for v in b_mesh.verts])

I think this is plenty fast. Are you running into any problems?

Your code probably doesn’t do what you want it to do, because it creates a flat array of “Vector” objects… But from your post I can’t quite figure out what kind of shape you want.

You may think that allocating a numpy array each time is slow, but it is probably faster than trying to fill it in with loops. In Cython, maybe that would be faster, but pure python loops are usually slower.

I only tested your code on a plain Suzanne, so maybe you are running it on bigger meshes… depends on the use cases, I guess!

On second thought, your main problem is that not every vertex is connected to the same number of vertices. So you can’t create the 3D cube structure you probably want.

I can help you more if you tell me for what you need that data.

I’m making a custom cloth simulator because I need some functionality the current one doesn’t have. It needs to be python because we’re integrating a lot of python tools into the process
Doing this with python is especially difficult because of speed. I’m getting around this with anaconda/numbapro/autojit/accelerate and by doing as much of the math as I can in the form of: list_a * list_b - list_c with numpy arrays. I’m testing the speed of almost every line as I go.

The biggest bottleneck I’m running into is in generating the arrays. The line I posted above is for getting the spring lengths. It needs to contain the distance to each connected vertex for each vertex. because some verts have more connected verts than others, I’m keeping track of the connected verts inside a multidimensional list. I was able to get a big speedup by generating a list of indices inside a multidimensional list. Each element in the list contains the indices of the connected verts for one vert and there is an element for every vert. I only have to generate that list once then I can update it with a list comp on each frame.


ob_verts = bpy.context.object.data.vertices

link_idx = np.array([[[ve.index for ve in ed.verts if ve.index != v.index] for ed in v.link_edges] for v in obm.verts])
    
stretch = [np.linalg.norm(np.array(ob_verts[i].co) - np.array(ob_verts[h].co)) for i, j in
                enumerate(link_idx) for h in j]

When I run the code with 200k triangles it takes around .2 seconds after the first frame. There are ways to generate arrays that are much faster but I’m not sure how to get the dimensions without losing speed.
Any comments are appreciated. If nothing else it’s nice to be able to discuss it with someone.
Thanks!

What I think you don’t get yet is that every element of “link_co” is a “Vector” instance, which is a mathutils type, not a numpy one.

My suggestions:

  • Copy the vertices and all the other geometry information into numpy arrays of dtype float (and the exact type does matter for performance!), and keep them there…
  • At the s
    tart of a simulation, make a list of springs in terms of vertex indices, so for every vertex a row of vertice indices to subtract from it. Not all vertices have the same number of edges, so you would need to fill the rest of the row with the current vertex index (so the difference is 0 later).
  • to create the deltas, index with the indices like verts_array[springs[0,:]] - verts_array[springs[1,:]]

That kind of indexing and multiplication is what Numba is best at. But: Seriously consider Cython if you just can’t stop thinking about loops. Especially if you think your vectorized incantations allocate tons of big intermediate arrays. It may be better to have O(n²) Cython loops than juggling around O(n²) sized memory allocations.

If you are doing these edge lookups using blender’s mesh apis, you are essentially dissecting “linked list/tree” like structures in the worst way possible. Accessing property lists, arrays, and whatever almost always involves copying memory or additional loops and using the CPython API itself.

The code examples are just for illustration… I don’t currently feel like trying it out :wink:

And I forgot to add: Edge indices should be kept in arrays of dtype int … probably int32 or int64.

Thanks for the info Bayesian,

I’ve been able to pull off some pretty complex code but I’m still new at it. I will take your advice to heart.
I do understand mathutils types versus numpy arrays but I’m sure there are several ways I could do the math and it might be faster in some cases to avoid converting to numpy arrays since I already have mathutils types.

I suspected but I wasn’t aware that specifying the dtype could affect performance. Thanks for that tip.

I like your suggestion to add elements to the array that have zero effect for the verts with fewer edges. This will simplify the code significantly and might actually be necessary for sending the functions to the cuda grid anyway.

It’s seems obvious to keep everything in numpy arrays and avoid referencing geometry data every frame but I’m glad you mention it because I had part of my code doing that.

Much appreciated!

The essential part about the Vector class is that numpy/numba need to access the overridden operators of that class, which involves all the overhead of the CPython API.

So if you have two arrays of dtype object, where the objects are Vector instances, you can still do like “a - b”, but numpy then has to call something like a.sub(b) for every element. That is one reason why the dtype matters for performance.

Another reason why it will matter later is that cpus and gpus work faster on certain “sizes” of int or float, and of course, the size of the arrays is also affected.

First I have to say thanks again to Bayesian. I’ve got all my spring calculations running every frame and I’m still at 30 fps even without cuda.

I’m struggling with numpy.linalg.norm on a 3dim array.
my array looks like this:

array([[[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]],


       [[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]]])

It contains four spring locations for each vertex
I want to be able to return the linalg.norm for each element in an array with the same shape.

array([[[3.7416573867739413],
        [3.7416573867739413],
        [3.7416573867739413],
        [3.7416573867739413]],


       [[3.7416573867739413],
        [3.7416573867739413],
        [3.7416573867739413],
        [3.7416573867739413]]])

I haven’t been able to get the right calculation using linal.norm(my_array, axis=number). It works fine on a 2dim array.
For now I was planning on reshaping for the calculation then expanding dimensions again.
As always, thanks in advance for any help.

The solution is to use the axis parameter:

numpy.linalg.norm(my_array, axis=1)

This computes the norm along an axis. This parameter is available in a lot of numpy functions, like sum and mean, too.

Additional side note: I think numpy is underutilized in Blender. Please understand this in the most humble way possible, I am not good enough in C to contribute to Blenders source code and I have the utmost respect for the great work which went into this awesome software.

Now let me rant a bit: Some APIs and datastructures in Blender should just use numpy arrays. They can be used from C almost without any penalty, because if you do it right, their data is just a linear array of floats for example. But they make modifying them in Python so much easier and faster! A case in point is the new BMesh API, which was supposed to make mesh data easier to handle, but it’s still easier, shorter and faster for me to convert the “normal” mesh data to numpy arrays and go from there.

I agree with you about using the regular mesh data converted to numpy arrays. I spent a couple weeks on a script using bmesh data with list comprehension. I’m realizing that keeping everything in numpy arrays is hundreds of times faster in many cases. It’s taking some getting used to though, all this reshaping and this array divided by this other array along whatever axis.

I have a more difficult question.

I’m doing np.mean() on a 3dim array. I want to eliminate the elements in the array that are zero so that the mean will return a larger value. for example: np.mean([0,5,3,7]) is not the same as np.mean([5,3,7])

I want to remove that elements containing only zeros in the code below so they won’t be a factor in the np.mean() I’ve thought of writing a vectorize function or flattening then using np.nonzero() to get an index array but I’m not sure what the best approach is.

array([[[0, 0, 0],
        [0, 0, 0],
        [8, 0, 2],
        [1, 2, 1]],


       [[0, 0, 0],
        [0, 0, 0],
        [8, 0, 2],
        [1, 2, 1]]])

I don’t know exactly what you want, but maybe this does it:

a[a!=0].mean()

Or do you need to consider the structure of the array in any way?

Thanks for the speed checks Kaufenpreis.
Happen to know if there’s any advantage in speed using uint32 over int32 when I don’t need negatives?

No, there shouldn’t be any difference in speed. But there is the difference that uint32 can take “higher” values than int32, which is completely irrelevant for vertex indexes, I hope. So it doesn’t really matter.

So, with the help of all you wonderful people, I’ve got a mesh with 30k triangles running with my simple cloth solver at 60+fps even without multithreading. I added a prototype collision detection just using obj.ray_cast() and it’s still running at 20 fps while giving me reliable collision and cloth behavior. I’ve managed to do everything the numpy way so far except for the raycasting. I will eventually experiment with writing a custom raycast in numpy that can run in parallel but for now the big slowdown is in a single line. It’s the only place in all of my code that isn’t doing things the numpy way and it’s the only place where I’m using an existing function:

hit = np.array([raycast(i[0],i[1])[0] + raycast(i[0],i[1])[1]*.01 for i in start_end])

start_end is currently an ndarray with pairs of vectors. Each pair contains the start and end location of each vertex at the beginning and end of the cloth solver. I’m using the list comp to return a new array with either [0,0,0] or a new vector that is the hit location plus a little bit of normal direction. Next I take the list of vertex coords and replace the hits that are not [0,0,0] with the hit array values using:

np.copyto(dynamic_co_pre, hit, where=np.expand_dims(np.any(hit, axis=1), axis=1))

All that said, I’m basically just looking for a way to run obj.raycast(v.co_start, v.co_end) on each pair of elements in an array containing the start and end location for each point, or two separate arrays or whatever works. I’m sure the is a numpy way to do this without using the list comprehension.

Let me know if the question isn’t clear.
Thanks in advance.

While that would be awesome, you may get a speed bump in the mean time from this.

https://developer.blender.org/D966

Thanks for the link Patmo.
Any chance there is a build floating around somewhere that has mathutils.bvhtree?

Jonathan Williamson tried to build it from source from Lukas’s patch. Drop a comment on that development page, maybe Lukas has a build to try. Apologies for the thread hijack!

-Patrick