Why not code a Machine Learning Remesher?

DeepBlender · July 2, 2019, 10:36am

TLDR: Dealing with topologies like meshes is not yet well enough researched in machine learning that we could take solutions from comparable problems and expect to get something meaningful. Due to the increased amount of research with similar problems (graph neural/convolutional networks), it may become a viable problem within the next few years.

There are two main difficulties from my point of view:

For every machine learning problem, we have to find a data representation which is usable for a machine learning model. When it comes to images, they can usually be used quite directly. When we are talking about a mesh, we need to investigate first how the information is best represented, such that we can deal with an arbitrary number of vertices, edges and faces. An unfortunate consequence of this is most likely that a “simple” neural network would not anymore be sufficient, but most likely a recurrent neural network would be required and those are a lot more complicated to deal with.
The second difficulty is the problem which @m9105826 already described. There is no absolute best answer and we don’t know how to reliably detect reasonable topologies from an artistic point of view. Nevertheless, we can compute how closely two meshes match each other. In fact there are plenty of ways to do that and we would need to figure out which ones are (most) suitable. Certainly, none of them is simple. That would clearly be a starting point.

This topic is quite far away from being solved and practically usable (with machine learning), even for the simplest case. We don’t even know whether we could come up with a machine learning approach which is better than conventional solutions.

How I would try to tackle this problem
Nevertheless, it is very interesting and I believe it is becoming viable to have a look at the static mesh case. As a starting point, we would need something that produces somewhat reasonable results. Once that is achieved, it could be iteratively improved and it could be made more general.
As training data, I would likely take a few hundred objects, subdivide them, add a little noise and apply a decimate modifier. That’s just to ensure we are not only dealing with perfectly clean mesh data.
As an initial task, I would train it to predict a topology with just halve the amount of vertices. That’s just because we need a starting point. It would be naive to start with more difficult tasks, because first, we need something that somewhat works. We need some sort of prototype or proof of concept, just to know whether the problem is solvable at all.
In the past one or two years, there has been more research around graph neural networks which are likely the best way to represent meshes in neural networks. I am confident that there are a few variants which might fit for this problem.
Next, we would need to figure out what the output of the neural network should be. Does it need to directly predict vertex coordinates or does it learn better when it can predict locations on triangles from the input mesh? Edges and faces are likely not that complicated (but I may as well underestimate it!).
Now we need a way to calculate the difference between the input mesh and the predicted mesh, such that the neural network has something to learn.
With all that in place, we could finally start to train the neural network. It would likely take a few weeks or months of playing around with ideas before anything reasonable is produced (with toy examples!). That would now be the starting point to improve every single aspect, such that we could slowly approach a solution which might be practically viable.

This process may sound incredibly frustrating and demotivating. In this case, it is mostly because first a solutions needs to be found which can be used as a starting point. But even if someone figured that out, getting those kinds of neural networks to work is quite a challenge. Usually, you hear about the deep learning/machine learning hype and how computers can do almost everything. What you basically never hear, is the amount of frustration and hard work related to it.