Department of Computer and Information Science

 

Computer Science Seminar Series

Acoustically-Driven Talking Face Animations Using Dynamic Bayesian Networks


3:00 p.m. Wednesday, November 5, 2008

Weir Hall, Room 235

Jianxia Xue
Adjunct Professor
Computer and Information Science
University of Mississippi


Abstract:

Visual speech information on a speaker's face is important for improving the robustness and naturalness of both human and machine speech comprehension. Natural and intelligible talking face animations can benefit a broad range of applications such as digital effects, computer animations, computer games, computer-based tutoring, and scientific studies of human speech perception. In this study, the focus is on developing an acoustically-driven talking face animation system. Acoustical speech signals are found to be highly correlated with visual speech signals, and thus can be used effectively to drive facial animations.

The acoustically-driven talking face animation system is developed using an audio-visual speech database with simultaneous recordings of audio and motion capture data. Dynamic Bayesian networks (DBNs) are applied to the acoustic-to-optical speech signal mapping in the acoustically-driven talking face animation system. Different DBN structures and model selection parameters are studied. Experimental results show that the state-dependent structures in the DBN models yield high correlation between reconstructed and recorded facial motions. More interestingly, the maximum inter-chain state asynchrony parameter of the DBN configurations has a greater effect on synthesis accuracy than the number of hidden states in the audio and visual Markov chains. Synthesized and recorded optical data are both used to generate animations for system evaluation. A lexicon distinction identification test is conducted with 16 human subjects. Perceptual test results on original optical data-driven animations show that the radial basis function algorithm provides highly natural rendering of talking faces. Perceptual test results on synthesized optical data-driven animations show that for some words the synthesized results yield similar lexicon distinction identification scores to the results using recorded data-driven animations. The formal perceptual test provides quantitative evaluation of the entire acoustically-driven talking face animation system, which can be very useful for future system tuning and improvement.


Biography:

Dr. Jianxia Xue joined the department of computer and information science at University of Mississippi as an instructor and Adjunct assistant professor in Aug. 2008. She was a software engineer at Sony Pictures Imageworks Inc during 2006-2008. She received her Ph.D. and M.S. degrees in Electrical Engineering from the University of California, Los Angeles in 2008 and 2001, respectively. She received her Bachelor degree in Electrical Engineering from Tsinghua University in 1999. Her research interests include speech signal processing applied to talking face animations and image processing and computer vision applied to 3D reconstructions from 2D videos.


[ Home | Site Map ]