[TEST] Bmesh performance tests

@mano-wii, these examples best be reported in our tracker.

This is just wrapping our C code so any errors here would be great to fix in the C code too.

@Bayesian, - it depends on many factors of course, I’d be curious (and surprised) if theres a Python BVH implementation that approaches the performance of a good C/C++ version.

The implementation of BVH in Python is not the main problem in terms of Performance. The bigger problem is how to get the mesh data into a form such that you can do efficient linear algebra with it. The current BVH implementation accesses the memory structures directly, without copying, whereas any Python implementation would have to make copies of every single vertex and edge using the API.

My point is not that Python is as fast as C, but rather that the C code of Blender is making it hard for Python to shine. I’m not a core developer and know too little C to really help out. I just believe that it would be better for Blender overall to implement the C structures around Numpy arrays, maybe even use Cython for API binding and avoid much of the copying and allocating. Again, just my two cents, I know it’s not my place to request this.

I reported the bug: https://developer.blender.org/T45706
I hope its solution reflects positively in the way Blender currently works with BVH.

I decided to do some more tests. I converted the “edges_BVH_overlap” to C++ using Cython.

def edges_BVH_overlap(edges1, edges2, const float epsilon = 0.0001):
    aco = ([v.co for v in e.verts] for e in edges1)
    bco = [[v.co for v in e.verts] for e in edges2]
    overlay = {}
    oget = overlay.get
    cdef float c1, c2, d1, d2
    for i1, (a1, a2) in enumerate(aco):
        for i2, (b1, b2) in enumerate(bco):
            c1 = a1.x
            c2 = a2.x
            d1 = b1.x
            d2 = b2.x
            c1, c2 = (c1 - epsilon, c2) if c1 <= c2 else (c2 - epsilon, c1)
            d1, d2 = (d1 - epsilon, d2) if d1 <= d2 else (d2 - epsilon, d1)
            if c1 <= d2 and c2 >= d1:
                c1 = a1.y
                c2 = a2.y
                d1 = b1.y
                d2 = b2.y
                c1, c2 = (c1 - epsilon, c2) if c1 <= c2 else (c2 - epsilon, c1)
                d1, d2 = (d1 - epsilon, d2) if d1 <= d2 else (d2 - epsilon, d1)
                if c1 <= d2 and c2 >= d1:
                    c1 = a1.z
                    c2 = a2.z
                    d1 = b1.z
                    d2 = b2.z
                    c1, c2 = (c1 - epsilon, c2) if c1 <= c2 else (c2 - epsilon, c1)
                    d1, d2 = (d1 - epsilon, d2) if d1 <= d2 else (d2 - epsilon, d1)
                    if c1 <= d2 and c2 >= d1:
                        e1 = edges1[i1]
                        e2 = edges2[i2]
                        if e1 != e2:
                            overlay[e1] = oget(e1, set()).union({e2})
    return overlay

And in a mesh with 200,000 edges, the results were:
Cython = 0.343764066696167 seconds.
Python = 1.6094510555267334 seconds.

These 1.61 seconds with Python, 0.34 seconds were spent just calling the BMVerts and BMEdges. So the performance difference is really exorbitant.

While we do not have a BVHTree function to edges in the Blender Python API, I think this is the best performance I can achieve.

I wonder if I used data.as_pointer() to get the memory address of some dll (or otherwise) in blend file would have an advantage.

And finally. Here is the result in Cython, referencing the PyObject BMesh as a C object.
And completely avoiding call PyObjects in large loops.


from libc.stdlib cimport malloc, free


cdef extern from "BLI_utildefines.h":
    pass


cdef extern from "DNA_customdata_types.h":
    pass


cdef extern from "DNA_listBase.h":
    pass


cdef extern from "bmesh_class.h":
    ctypedef struct BMHeader:
        int index


    ctypedef struct BMVert:
        float co[3]


    ctypedef struct BMEdge:
        BMHeader head
        BMVert *v1
        BMVert *v2


    ctypedef struct BMesh:
        int totedge
        BMEdge **etable
        void *py_handle


cdef extern from "bmesh_operator_api.h": #!!!!!!!
    pass


cdef extern from "bmesh_iterators.h":
    pass


cdef extern from "bmesh_py_types.h":
    ctypedef struct BPy_BMesh:
        BMesh *bm
        int flag


    ctypedef struct BPy_BMEdge:
        #BMesh *bm
        BMEdge *e
    
def edges_BVH_overlap(pybm, list edges, const float epsilon = 0.0001):
    cdef BPy_BMesh* cpybm = <BPy_BMesh*>pybm
    cdef BMesh *bm = cpybm.bm


    overlay = {}
    oget = overlay.get


    cdef BMEdge *ed1, *ed2
    cdef BMVert *a1, *a2
    cdef BMVert *b1, *b2


    cdef float c1x, c2x, c1y, c2y, c1z, c2z, d1, d2, tm
    cdef int i1 = 0, i2 = 0
    cdef bint by, bz
    cdef int size = bm.totedge
    cdef int size2 = len(edges)
    cdef BMEdge **eds2 = <BMEdge**>malloc(size2 * sizeof(BMEdge*))
    cdef BPy_BMEdge *cpyed
    for pyed in edges:
        cpyed = <BPy_BMEdge*>pyed
        eds2[i1] = cpyed.e
        i1+=1
    del edges


    for i1 in range(size):
        ed1 = bm.etable[i1]
        a1 = ed1.v1
        a2 = ed1.v2
        by = bz = True
        c1x = a1.co[0]
        c2x = a2.co[0]
        if c1x <= c2x:
            c1x -= epsilon
        else:
            tm = c1x
            c1x = c2x - epsilon
            c2x = tm
        for i2 in range(size2):
            ed2 = eds2[i2]
            b1 = ed2.v1
            b2 = ed2.v2
            d1 = b1.co[0]
            d2 = b2.co[0]
            if d1 <= d2:
                d1 -= epsilon
            else:
                tm = d1
                d1 = d2 - epsilon
                d2 = tm
            if c1x <= d2 and c2x >= d1:
                if by:
                    by = False
                    c1y = a1.co[1]
                    c2y = a2.co[1]
                    if c1y <= c2y:
                        c1y -= epsilon
                    else:
                        tm = c1y
                        c1y = c2y - epsilon
                        c2y = tm
                d1 = b1.co[1]
                d2 = b2.co[1]
                if d1 <= d2:
                    d1 -= epsilon
                else:
                    tm = d1
                    d1 = d2 - epsilon
                    d2 = tm
                if c1y <= d2 and c2y >= d1:
                    if bz:
                        bz = False
                        c1z = a1.co[2]
                        c2z = a2.co[2]
                        if c1z <= c2z:
                            c1z -= epsilon
                        else:
                            tm = c1z
                            c1z = c2z - epsilon
                            c2z = tm
                    d1 = b1.co[2]
                    d2 = b2.co[2]
                    if d1 <= d2:
                        d1 -= epsilon
                    else:
                        tm = d1
                        d1 = d2 - epsilon
                        d2 = tm
                    if c1z <= d2 and c2z >= d1:
                        if i1 != ed2.head.index:
                            e1 = pybm.edges[i1]
                            e2 = pybm.edges[ed2.head.index]
                            overlay[e1] = oget(e1, set()).union({e2})
    free(eds2)
    return overlay

The test result in a mesh with 200,000 edges was 0.011622138977050781 seconds!

Results:
Python = 1.609451055526733 seconds.
Cython with PyObjects = 0.343764066696167 seconds.
Cython calling C blender structs = 0.011622138977051 seconds.

But the disadvantage of having to compile this module for all the different platforms prevents me from using it in an addon.
Perhaps there is a pure python solution

There is no pure python solution for getting this performance.
If you do bind the interal C-API please use the RNA API as that is better suited to do these kinds of things.
In the future it could maybe be possible to make the rna generator also generate cython files to make this a little less unsafe

I do not have much experience with Cython, and I’m still trying to understand the Blender API.

I’m trying to do what you recommended, for so I am reading this article:
http://wiki.blender.org/index.php/Dev:2.5/Source/Architecture/RNA

But I’m having difficulty. It seems that doesn’t have an RNA type for Bmesh (I don’t know).

@Juicyfruit, could you, please, show an example of how to use the RNA API in Cython?