Alright, I’m back from holidays and have just had a chance to check this out and run some tests! This might end up being a long explanation (it was)!
TL;DR: The simulator chooses between two different ways to multithread the calculations for handling obstacle objects. For this scene, the simulator chose a less than ideal scheme.
The slowdown will be caused by the large number of objects rather than if the object have the Export Animated Mesh option enabled. The animated mesh option will only affect the time it takes to export the mesh from Blender for use in the fluid engine. From the point of view of the simulator, it only sees a sequence of meshes and does not know whether the object was keyframe exported or animated exported.
How are animated/dynamic obstacles handled in the simulator?
For obstacles to interact with the fluid and to be used in the physics calculations, their geometry needs to be converted to another form. The objects are converted to volumetric data called a Signed Distance Field (SDF for short). A SDF is basically a grid of data that tells us whether a point is inside or outside of the object volume, and also stores the distance to the nearest point on the volume surface. You may have heard about SDFs as one of the data structures in OpenVDB.
Updating the SDF can be quite an expensive operation within the engine. Every time an object moves in the simulator, the SDF must be updated. Even during substeps, we need to update the SDF. The meshes are interpolated within the frame between substeps. So if you have 38 substeps in a frame, the SDF will be updated 38 times if there are moving objects and all that time really adds up.
Since the SDF update is so expensive, we have some optimizations to reduce the amount of data that needs to be updated. If objects are not moving, they only need to be computed once. The engine uses separate grids to store the static obstacles and dynamic obstacles, and also a master grid that merges and stores all of the SDF information. So if you have a few static obstacles and a few dynamic obstacles, the static and dynamic data will be stored on the separate static and dynamic grids. This avoids needing to re-compute the static data.
Animated obstacles can also start/stop their motion. Another optimization is that the engine automatically detects if a dynamic object is actually moving/changing during the frame. If a dynamic object is static during part of the animation, it will be moved and stored on the static grid until it starts moving and needs to be updated again. This is another way that we avoid re-computing SDF.
Additional optimizations for dynamic objects
And then there are more optimizations within the simulator specifically for handling dynamic objects. Due to the previous optimizations of separating static/dynamic objects, the majority of the time spent updating the SDF concerns only the dynamic objects in the simulation. There is some further optimization for how we multithread dynamic SDF calculations.
Inside the engine, the meshes may not be represented exactly as they are in the Blender scene. The engine actually separates all of the mesh geometry into separate pieces if it can (we’ll call each these mesh islands). If the geometry of a Blender object can be separated into individual pieces, the engine will do this and treat each piece/island as a separate object. For example, if you have a single Blender object that is made up of 60 cylinders, the engine will separate this object into 60 mesh islands that will be each handled separately in the engine. This optimization is to create smaller object volumes when possible and smaller volumes are quicker to compute than a single large volume containing many pieces.
How are the SDF calculations for dynamic mesh islands multithreaded?
There are two different multithreading schemes in the engine for how all of the mesh islands are converted into an SDF. For the rest of the explanations, I’ll just assume there are 8 threads available.
Scheme 1: Calculate each mesh island one by one and assign 8 threads to work on each island
Here is some general pseudocode for how this works:
for each mesh island:
1. Create a tight fitting grid around the island.
2. Divide the grid into subgrids (typically smaller chunks
of 8 x 8 x 8 voxels).
3. Put each subgrid into a queue for processing
4. Launch 8 threads. each thread will:
4.1. pull a subgrid off the queue
4.2. calculate the SDF for this small region
4.3. merge the subgrid SDF into the mesh island grid
4.4. repeat until processing queue is empty
5. Finally, merge the mesh island grid with the dynamic SDF grid
- Memory efficient. Memory only needs to be reserved to compute a single mesh island at a time.
- Fast for large obstacles that cover a lot of grid space
- Can be slow for small obstacles that do not cover much grid space. For example, for thin narrow cylinders, having many threads work on a single object can be overkill and also add a lot overhead
This multithreading scheme is what is being used in your scene and is the reason for why the SDF update is running slowly.
Scheme 2: Calculate 8 mesh islands at a time and assign 1 thread to work on each island
Here is some general pseudocode for how this works:
1. Put each mesh island into a queue for processing
2. Launch 8 threads. Each thread will:
2.1. pull a mesh island off the queue
2.2. create a tight fitting grid around the island
2.3. calculate the SDF for the entire island grid in the single thread
2.4. merge the mesh island grid with the dynamic SDF grid
2.5. repeat until the processing queue is empty
- Very fast if the mesh islands are small and do not cover much grid space
- Uses more memory than scheme 1 since now 8 mesh island grids need to be reserved at once. If all mesh islands are large and cover a lot of grid space, this scheme has the potential to use a very large amount of memory. If all mesh islands are small however, the increase in memory can be negligible.
- If there are not a large number of objects and the objects vary in size, there may be wasted computing power. For example if there are 7 small objects and 1 large object, the 7 threads will complete quickly and the simulator will be waiting on a single thread to complete the single large object.
When is each multithreading scheme used in the simulator?
In general, scheme 1 is used most of the time due to the assumption that most use cases for the simulator would involve a small number of obstacles that are on the larger side.
Scheme 2 is mostly used for simulations that involve the Blender Fracture Modifier branch. Early on in development we decided that we wanted to support the Fracture Modifier within the FLIP Fluids simulator. Fracture modifier simulations could contain hundreds or thousands of small obstacle pieces, which would be too slow to process under scheme 1. Scheme 2 was designed around adding fracture modifier support.
There is one thing to know about the fracture modifier that determines how the scheme 2 optimization is triggered: Fracture modifier simulations are entirely contained in a single Blender object. Hundreds of small individually moving pieces are contained in the object. This fact is what determines whether to use Scheme 1 or 2.
Whether to use scheme 1 or 2 depends on the number of mesh islands contained in a single Blender object. The engine is not limited to use a single scheme in an SDF update and will choose which to use on a per-Blender-object basis.
Scheme 1 - Will be used if the number of mesh islands in a single Blender object is less than or equal to 25.
Scheme 2 - Will be used if the number of mesh islands in a single Blender object is greater than 25.
Since each of the 60 cylinders are separate Blender objects in your scene, scheme 1 will be chosen by the engine. Scheme 2 would actually be optimal for this scene.
Testing and timing results
I have recreated a scene with a similar setup that you had described. Since this optimization testing mainly concerns the SDF update, I removed surface tension and viscosity features to leave out irrelevant computations. The timing to update the SDF was measured for single steps over a few frames on an Intel i7-7700 @3.60 GHz CPU.
Scheme 1: 48 seconds per substep
Scheme 2: 8 seconds per substep
This shows that scheme 2 would be a much better choice for multithreading this type of scene.
Forcing your scene to use scheme 2
A way to force the simulator to choose scheme 2 would be to merge all of the cylinder objects into a single Blender object. As far as I know there is not a simple way to do this in Blender when each mesh island needs to move independently of eachother. Scripting is the only way I could think of doing this.
This example script will merge all objects whos name starts with Cylinder. The mesh will be merged into a single object named MeshCache and will update its geometry every frame:
# This script will merge all objects that contain
# the prefix below into a single procedural object
OBJECT_PREFIX = "Cylinder"
# Objects will be merged and stored in an object
# named below. This object can be set as a FLIP Fluid
# object for use in the simulator.
MERGED_OBJECT_NAME = "MeshCache"
if bpy.data.objects.get(MERGED_OBJECT_NAME) is None:
mesh_cache_data = bpy.data.meshes.new("mesh_cache_data")
mesh_cache_data.from_pydata(, , )
mesh_cache_object = bpy.data.objects.new(MERGED_OBJECT_NAME, mesh_cache_data)
object_list = 
for obj in bpy.data.objects:
vertex_tuples = 
triangle_tuples = 
index_offset = 0
depsgraph = bpy.context.evaluated_depsgraph_get()
for obj in object_list:
obj_eval = obj.evaluated_get(depsgraph)
new_mesh = obj_eval.to_mesh()
for mv in new_mesh.vertices:
v = obj.matrix_world @ mv.co
vertex_tuples.append((v.x, v.y, v.z))
for t in new_mesh.polygons:
triangle_batch = 
for idx in t.vertices:
triangle_batch.append(idx + index_offset)
index_offset += len(new_mesh.vertices)
mesh_cache_object = bpy.data.objects.get(MERGED_OBJECT_NAME)
mesh_cache_object.data.from_pydata(vertex_tuples, , triangle_tuples)
The MeshCache object can then be set as a FLIP Fluid obstacle with the Export Animated Mesh option enabled. Since the simulator will detect this as an object with more than 25 mesh islands, scheme 2 will be used.
An example .blend file including the script in case you would like to take a look at the setup: many_objects_fracture_optimization.blend (2.2 MB)
You will need to press the Run Script button in the text editor window to activate the script. This script is only supported in Blender 2.81 or later.
How can we optimize the SDF update in the future to better handle this type of scene?
- Smarter triggers for choosing between scheme 1 and 2. Maybe base this choice on grid coverage of mesh islands?
- Maybe we could use a mixure of scheme 1 and 2. For example, compute 4 mesh islands at a time and assign 2 threads to each mesh island.
- Does the SDF really need to be updated on every substep? Maybe we can get away with updating the SDF less often when there are many substeps during a frame?
- Do we need to calculate the entire SDF grid? Maybe we could only compute parts of the SDF that are in close proximity of the fluid?
- Add options to select an object primitive rather than a general mesh. The SDF is expensive to compute because it handles general meshes of any shape. Adding primitives with known shapes are very quick to compute (Cubes, Spheres, Cylinders, Cones, etc.).
There are a lot of ways we can make the simulator faster. Optimization is one of my favourite parts of development. It just takes time to develop! Some optimizations can take a lot longer to develop than others.
Anyway, this explanation ended up being a lot longer than I had expected and did not plan to go this much into detail how the internals of the engine works!