I am looking for some good material for speding up lipsync work in Blender. I am using a rig, not shaping, so something that allows IPOs to be extracted and applied to the individual ‘bones’. Of course, mos tthings I can improvise my way out of, but something to make my path from “audio file with no lip movement” to “talking heads” shorter is greatly sought after. Any software automating parts of the process are ESPECIALLY welcome! At the moment, I am testing Papagayo, but something more advanced / efficient / integrated would be nice.
as part of my quest for autorigging, I was thinking that you could define a face armature with bones that moved for each phenome. Once you have that, then with Papgayo you can define words and whole speaches. Then, all you have to do is take HeadMeshA and parent it to the armature, weight paint, and now you have a talking head. Take that same armature with all its actions to another head mesh, and bingo it talks too! The only part I do not think is automatable is weightpainting the bones to the mesh, although blender does try, but each head is going to be different. BUT, we all have the same jaw and muscle sets, so why not define that as reuseable. just a thought.
Software automation sucks, no questions asked. It’s rigid, it’s stuttery, it’s emotionless, there’s no way of adding character, there’s no way to personalize it. If you want it to look good at all, do it yourself.
I’m with BlackBoe on this one. Not that I do any great amount of lip-sync, and no serious work at all, but unless you only want speed and not quality, then I doubt automation will deliver a result that doesn’t require a lot of tweaking.
Think about it - Imagine Julie Andrews, Clint Eastwood, Sylvester Stallone, Will Smith and Jay Lenno all saying the exact same thing - would they all move their mouths the same? Of course not. And it’s not just different phoneme shapes, it’s the fact that they wouldn’t even make the same sounds for the same words.
Take, for example, Stallone’s famous “I didn’t do anything” from First Blood - what he actually says is something like “I d’n do - un’th’…n”. It’s like one long word with some subtle intonations between the things we recognise as “words”. Getting your auto-sync software to deliver a decent result from a piece of text that says “I didn’t do anything” would take some effort. I could lip-sync it with shapes in a minute or two (maybe less).
Add to this the fact that the shape for any given word can be different each time it’s spoken - for the same character - and manual syncing starts to look like a good choice… unless speed is is more important than quality - or you’re animating simple robots.
I am not asking the software to interpret the sound (though I would love something that could do that NICELY… 3D MAX’s Voice-o-matic is the best I’ve seen, and I agree that it is still very far from perfect). What RogerWickes mentions seems to be somewhere in my line of thought, but there are still too many cases of “click here, then there, then there”, routine operations that really need no human decisions but just take up time.
As for style of speech, it is already a part of the product planning. But it is important to get all the possible tools on the table before moving on to that sort of ‘advanced’ topic.