How to detect duplicate objects in scene with STEP file?

Hello!

I am a student currently researching the possibility to use python scripts for automating the optimization of STEP files in Blender. My scripting knowledge and knowledge about python is quite limited unfortunately.

So currently I am trying to write a script that detects duplicate objects in the scene. The scene has a lot of objects (around 5000) and the object’s names are very complex. This makes it difficult to compare their names and in my experience so far, I have to pay attention to not make the script too slow.

My thought is to compare the objects based on their location, maybe bounding box? And then finally compare them by the first few characters of their names, however I am quite lost on how to approach this.

    bpy.ops.object.select_all(action='SELECT') 
    
    ob = bpy.context.active_object
    selected = bpy.context.selected_objects
    
   
      
    #First check to filter out the most basic duplicates
    for obj1 in selected:
        for obj2 in selected:
            if obj1.type == 'MESH' and obj2.type == 'MESH':
                if obj1 == obj2:
                    print (obj1.name, obj2.name)
                    continue
                
                 
                #Center of bounding box
                center1 = sum((Vector(b) for b in obj1.bound_box), Vector())
                center1 /= 8
                    
                center2 = sum((Vector(b) for b in obj2.bound_box), Vector())
                center2 /= 8 
                 
                
                #filter to compare the vertices/polycount                                            
                if len(obj1.data.vertices) == len(obj2.data.vertices):
                    print ("Same count: ", obj1.name, ", ", obj2.name)
                    continue
                
             
                
                if center1 == center2:
                    bb_1 = [bbox_co[:] for bbox_co in obj1.bound_box[:]]
                    bb_2 = [bbox_co[:] for bbox_co in obj2.bound_box[:]]
                    
                    print (bb_1, obj1.name, bb_2, obj2.name)
                    
                    if bb_1 == bb_2:
                        print ("TEST TEST TEST")
                        continue
                    
                    if obj1.name == obj2.name:
                        print ("Duplicate found!")
                        print (obj1.name, obj2.name)

The code above is a rough attempt and it does not work sadly. It also feels wrong to me but I feel a little lost on where to improve this code and what steps need to be taken to accurately detect duplicate objects.

Do you have any tips or recommendations?
Thanks in advance!

Hey, could you please add a screenshot of how it looks like in your viewport and the outliner ?

Hey, I’m sorry but due to NDAs I cannot share the model and its name. Sorry for that, I know it’s a bit unclear this way!

You can try and show with simpler objects

Okay, so I tried to replicate it as an example. Imagine the shapes are a complex machine containing of around 5000 loose objects, like screws, plates, bolts, etc. It will not be visible but some objects like can be dupliate objects, which would need to be deleted for optimization reasons.
All objects are sorted in the Inside and Outside Collection based on if they are visible on the outside or not. Imagine the names in the outliner to be way more complex, since the STEP file is imported from CAD software.

Now I think that the problem might be that all objects like screws etc. are instanced/linked. This might make it difficult to detect real duplicate objects? Not sure.

Why not select by area or volume? If you have 100 screw objects with random names, but they’re the same size, they’ll have the same area, making them much simpler to select

How could this be done?

Thanks! I will give this a try for sure. I could imagine that it might be an issue that there are a lot of different small objects though and but i’ll see!

Any script that has to check the attributes of 5000 objects is going to lag, be it simple as name or more complicated like area. Python is not a fast language. The way I see it, if you’re going to lag either way, you might as well make your life easier and use something like surface area instead of trying to figure out common naming patterns between random objects :slight_smile:

Yes that’s true, thanks for the tip! I will try it out for sure.

Actually, I also just found an alternative solution. Seems like comparing the mesh itself and their location does the job. In my scene, this worked perfectly:

bpy.ops.object.select_all(action='SELECT') 
    
    #NEW METHOD - COMPARE THE MESHES
    
    ob = bpy.context.active_object
    selected = bpy.context.selected_objects
    
    #First check to filter out the most basic copies
    for obj1 in selected:
        for obj2 in selected:
            if obj1.type == 'MESH' and obj2.type == 'MESH':
                if obj1 == obj2:
                    obj1.select_set(False)
                    obj2.select_set(False)
                    continue
                
                if obj1.data != obj2.data:
                    continue 
                
                matr1 = obj1.matrix_world
                matr2 = obj2.matrix_world
                
                if matr1 == matr2:
                    print ("Duplicate found! ", obj1.name, obj2.name)
                    obj2.select_set(True)
                                
                
            else:
                print ("No duplicates", obj1.name, obj2.name)
1 Like