I kind of don’t see all the grandeur in those results, I’m sorry, I must be some kind of dumb-dumb. What I see is much foot-slipping and generally ill-resolved foot movements (especially for the faster movements like running etc.).
These kinds of movements (including the aforementioned problems), I can have for free from CMU right now. What do I need machine-learning for?
This is pretty basic research to find out whether it works at all.
The novelty of their work is to use a diffusion model to show the flexibility of the approach. They show it is overall quite stable, allows unconditional generation, but also other conditional generations, like text to motion and action to motion (and they also discuss motion in betweening).
There are other projects that are closer to being production ready which incorporate usually better/physically more plausible data, better/more consistent rigs, special treatment of contacts (like ground contact) to prevent sliding.
Naturally, when it comes to the quality of this paper, it doesn’t produce the best quality results, but that was also not their goal.
So all of this is image type AI stuff it seems, any good/free audio applications?
Basically, what I’d like is a text to speech system that creates a natural sounding voice track, ideally with various accents and inflections, etc so it sounds like a normal person talking/acting rather then like a robot or Stephen Hawking.
I guess that shows some potential, but yeah, nothing actually really usable for the average person.
Putting aside the bad lip-sync, etc the actual voice isn’t bad, assuming that was text to speech and not uploaded audio. The free trial works for some basic audio tests, but no real control over the speech, like what to emphases or pause over or stretch out, but then it is more geared towards video output. And the pricing could be an issue over time.
sorry if you feed it with some renders with some basic geometry, wich has enough perspective and visual clues you can not only art direct the ai, AND also use the result and the original scene as a modeling reference.
unluckily I don’t have that much time to bring a project from start to finish, (maybe in the coming month),
but for me this is imho a biggie. Really surprised how well stable diffusion keeps the perspective input.
Not as realtime 3D graphics if that has been the case. But because they have used a motion control camera and car platform, it would be possible to take the pre planned camera and platform motions to Blender and render background animation as a video for virtual studio LED screen. Anyway, in this particular case the action is planned and timed precisely beforehand, so it doesn’t require Unreal Engine setup that would track manually operated camera in real-time during the actual shoot. Unreal Engine or other software could then be used just for the video playback.
So, the same result is doable in Blender for the background video.
Edit: Based on the behind the scenes part of the video they tracked and recorded handheld camera motions beforehand for animatic. Most likely the camera motion for actual shoot is not exactly same as the handheld version, but cleaned and modified afterwards, because motion control camera needs to go very precisely inside the car through windows and they didn’t have windows when recording handheld camera operation for animatic.