I really hope I can prove the statement of how Blender’s cloth simulation being inferior to that is of MD wrong with these:
Depending on how big the cloth is, with 1cm division, I can get between 0.2-8 fps of simulation time. For me it’s more than fast enough to get satisfactory results. Even if it’s not, we can always bake the simulation so we can play the simulation in real time.
As to the plans to make cloth simulation GPU-based, unfortunately, I don’t think there is. As far as I know, the reason why cloth simulation is using CPU in the first place, and keeps doing so, is because GPU is not fit for such task.
The thing is, as with other physics-based simulation, cloth simulation is a continuous process from start to end frame, e.g. if you run a cloth simulation from frame 1-10, frame 1 is dependent on frame 0’s cloth state, frame 2 is dependent on frame 1’s cloth state, and so on. That’s the reason why if you’re running cloth sim, and then you click at random frame in the Timeline, the cloth sim will freak out.
CPU is great at this particular task, because it’s super quick and powerful, but it does one thing at a time. They’d do frame 1, finish it, and only then they’d move on processing frame 2, finish it, and only then move on to the next frame.
On the other hand, GPU is actually not that super quick and powerful, but it has like thousands of cores, and each core can do things in random order.
Rendering is an ideal task for GPU, because each one of those thousands of cores only has to do a very small and simple task, rendering 1 pixel at a time. And because each core can do so in random order, it can jump from one part of the rendered picture to the other, where ever it’s needed most.
You’d notice this effect when you turn on view port shading in Cycles, that some spot render quicker than others. Usually the darkest and brightest spots are the longer parts to render.
So I guess, imagine running cloth simulation in GPU. One core would process frame 1, and the other would process frame 9, and the other would process frame 44. By the end of simulation, you wouldn’t get a nice continuous simulation, but a random mess.