GSoC 2014 Lip Sync

Hi guys,

I’m applying to GSoC 2014 with my lip sync project. You can find the proposal here:

I want a lot of artist feedback, so for the proposal now I want to know from you, how you think the lip sync workflow with blender would work best!


For me it would just amazing if, Rigify could get a companion Facify, so you could add a meta rig for facial rigs.

When that’s done, we have armature for a standard face. Then it’s possible I guess to have something like facerobot plausible.

a) enter script in script editor view, “Welcome to Blender Foundation”
b) VSE displays wave spectrum
c) parses scripted file,shows individual words as blocks or labels with the wave spectrum
d) it drives the face rig,

If we got something like Face Robot in Blender I would be over the moon. There simply isn’t another off-the-shelf solution like it out there.

Btw there already is a rigify rig facify :smiley:

NeXyon, this is really exciting. Thank you for working on a GSOC for this.
I’ve been practicing with the older, “non-realistic” method of using Papagayo DAT files into Blender, and shape keys on the verts of the mesh of the face to make the standard phonemes. This works okay, but it’s by no means very convincing without putting a great deal of work into getting your phoneme shape key library just so. Also, as far as I can tell, Papagayo only allows for about 300 frames in a DAT per WAV file. This is fine, if you have only one or two lines in a speech.
But the lipsynch animation in the video Blurymind posted (thank you, Blurymind, that was really cool!) is a lot more convincing.

I’d love to see you make something really great and really easy to use.
Also, MakeHuman just came out with a new Makehuman 1 upgrade. Do you think there could be ways supporting that with what you’re doing?

At any rate, thank you for taking up this project. It’s very exciting, and great news for character animation in Blender. :slight_smile:
I hope you’ll keep us updated. Good luck.

Hi. I’ve got a linguistics degree and this project sounds great. :slight_smile:

I know the usual way riggers like to do things in lipsync revolves around visemes/phonemes but I reckon you can go one better: how about an engine which goes one layer deeper than phonemes (technically phones - phonemes are related collections of phones) and drives shapes/poses based on distinctive features instead?

So instead of saying “this is shape K/T/S/D”, we say “this is K, therefore the driver controlling how far the lips are drawn back is slightly active” - obviously you can still derive the distinctive feature information from phonemes, but instead of setting up visemes for each sound you give it drivers to control the amount of:

  • lip rounding/pucker
  • lip spreading (you activate rounding + spreading at once to visualise frontal U sounds like in French tu and Scottish accented oo – Brave’s lipsync really ticked me off…)
  • lip retraction (showing teeth)
  • bottom lip curl (for F + V)
  • jaw open/shut
  • tongue “poke” (how far forward the tip is)
  • tongue curl (how far bent back/forward the tip is)
  • tongue laterality
  • tongue arch

Driving those nine things will let you lipsync pretty much every single language on the planet, not just American English. (Yes, that includes click languages.) If you do it right you should be able to dial emotions over the top as well - smiles, snarls, frowns, pouts, etc. It could be a whole mouth performance solution.

Face Bone Tool is a mess, and last time I looked it wasn’t even compatible with newer versions of Blender. Pointing someone who had used Face Robot at it and saying “we have something similar” would just get laughs.

i didnt say we have something similar. Just said that a face rig for rigify has been done in the past before.

Since it is a mess, it is perhaps worth noting that part of the reason for that is the fact it is a third party plugin and not something officially supported by the blender foundation.
The reason face robot is so strong is it has been officially developed and is supported by the people who made the software

Face Bone Tool is a mess,
yes it is a miss, and if you are not willing to work with this level of messiness, you won’t like face robot.

face robot ‘looks good’ but is over-rated, it needs too much work to get it right, i’d rather lipsync manually.

anyway who would use it:
if you don’t have the time and budget -> shapekeys is easy and fast with good results, ex. sintel.
if you have the time and budget -> use complex rigs to do facials. like big companies do. ex pixar, dreamworks etc…

I’d rather use 'papagayo, yolo…'s approach to lipsync:
-it doesn’t need to analyze the phonemes you wrote, just word blocks.
-tune them in a good editor, maybe new mode in VSE already fine.
-code already there as there open source, easy to implement.
-no new libs added in blender, no added time to process the analysis.

it’s quite different to see it, than to use it in real life work. simple is good.

BTW isn’t there already a good integration like that in blender! HERE.

I’ve used Face Robot since its release and would gladly use it a million more times before trying to struggle with Face Bone Tool again. Not sure what you mean when talking about it being harder than it looks; rigging, skinning, and lip sync are incredibly easy with it.

I tend to take a hybrid approach. I use a relatively complex bone-based facial rig, but then I use a pose library to store phoneme shapes. This way, I can rough-in my lip sync as if I were using shape keys, but retain the ability to sweeten the animation with flexible face controls on subsequent passes.

I’ve used Face Robot since its release and would gladly use it a million more times before trying to struggle with Face Bone Tool
good for you, and i do agree, that’s why i use shapekeys because we don’t have the resources, but major studios would go using 100-200 facial controls over many bones.

and lip sync are incredibly easy with it.
i meant the setup process, especially if you do a foreign language, which needs more preparations, but also the lipsyncing, you need to tweak a lot.

I use a relatively complex bone-based facial rig, but then I use a pose library to store phoneme
harder than shapekeys, but still easier than face robot.

the thing is, it depends on how much you’ll get paid, i heard in one of the podcasts that animators do normally 5 sec in 8 month or so -i don’t remember exactly-, i do somtimes 30min/month animations, 5min/2h lipsyncing, and i believe most blender users who get paid for working in blender doesn’t have the time to do complex rigs or time consuming setup, simple is enough.