Automatic lip synch.

September 28, 2008 at 5:33 pm 1 comment

I wasn’t originally going to talk about this until I had something to show, but my next AIR project was going to be something like my old e2animate application that I talked about in my previous post.

I was going to aim it at non-technical users.  Make it fun.  Easy to use.  Allow them to access an online library of clip-art.  Plus I was going to incorporate a very powerful and special feature.  Automatic lip synchronisation.  Just drop a mouth shape into an animation, and it would automatically synchronise to a sound track.

However, someone has beat me to it with an AS3 lip-synch implementation.  Samir has created a Flash component that he’s talking about licensing for free.

I am pleased that someone has possibly saved me all the hard work.  But I was quite looking forward to the technical challenge of doing this myself.

I don’t know the details of Samir’s algorithm, but this is how I was going to do it…

The first thing is to determine the shape of the mouth from each segment of speech.  I was going to look at two possible ways of doing this.  Prediction gain, and frequency bin matching.

Predication gain is the power of the signal coming out a filter, divided by the power of the signal going in.  If the filter matches the characteristics of the signal, then we get a high value (because we don’t lose much power), but if it doesn’t match, we get a low value, because the filter blocks out more of the signal.

So, imagine a bank of fixed filters, each one detects a particular sound.  (A,I/E/U/L/W,Q/M, fricatives, etc. – possibly also taking different kinds of voices into account).  Depending which filter gives us the best match, we select a particular mouth shape.

Frequency bin matching is like matching the power in parts of the frequency spectrum to known sets of values, again corresponding to different kinds of sound.

MP3 is a sub-band coder.  An MP3 file actually contains these frequency bin values.  Or they can be derived from an FFT (SoundMixer.computeSpectrum()).

I also heard that pixelbender could be used to process sound, and I was going to investigate this, and anything else Cosmo had.

Anyway, it was probably going to take a little fiddling around, possibly some time-based signal processing too, but I’m sure something based on these methods could automatically generate a mouth shape.  Then, the size of the mouth is just proportional to the amplitude/power of the speech signal.

And that’s how I was going to do dynamic lip synch in ActionScript.


Entry filed under: Adobe AIR.

Flash CS4 tweening. Nothing new. AIR/AS3/Flex jobseeking

1 Comment Add your own

  • 1. suat  |  May 20, 2009 at 8:10 am


    Nice work.
    i read about lipsync. Please, can you send me the lipsync component that you are using ? I cant find it samirs web site.

    thank you.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed

  RSS feed          View Daniel Freeman's LinkedIn profileView my profile

Add to Technorati Favorites

September 2008
    Oct »

%d bloggers like this: