| |
Computer Science Seminar Series
Acoustically-Driven Talking Face Animations Using Dynamic Bayesian Networks
3:00 p.m. Wednesday, November 5, 2008
Weir Hall, Room 235
Jianxia Xue
Adjunct Professor
Computer and Information Science
University of Mississippi
Abstract:
Visual speech information on a speaker's face is important for improving the robustness and naturalness of both human
and machine speech comprehension. Natural and intelligible talking face animations can benefit a broad range of
applications such as digital effects, computer animations, computer games, computer-based tutoring, and scientific
studies of human speech perception. In this study, the focus is on developing an acoustically-driven talking face
animation system. Acoustical speech signals are found to be highly correlated with visual speech signals, and thus can
be used effectively to drive facial animations.
The acoustically-driven talking face animation system is developed using an audio-visual speech database with
simultaneous recordings of audio and motion capture data. Dynamic Bayesian networks (DBNs) are applied to the
acoustic-to-optical speech signal mapping in the acoustically-driven talking face animation system. Different DBN
structures and model selection parameters are studied. Experimental results show that the state-dependent structures
in the DBN models yield high correlation between reconstructed and recorded facial motions. More interestingly, the
maximum inter-chain state asynchrony parameter of the DBN configurations has a greater effect on synthesis accuracy
than the number of hidden states in the audio and visual Markov chains.
Synthesized and recorded optical data are both used to generate animations for system evaluation. A lexicon
distinction identification test is conducted with 16 human subjects. Perceptual test results on original optical
data-driven animations show that the radial basis function algorithm provides highly natural rendering of talking
faces. Perceptual test results on synthesized optical data-driven animations show that for some words the synthesized
results yield similar lexicon distinction identification scores to the results using recorded data-driven animations.
The formal perceptual test provides quantitative evaluation of the entire acoustically-driven talking face animation
system, which can be very useful for future system tuning and improvement.
Biography:
Dr. Jianxia Xue joined the department of computer and information science at University of Mississippi as an
instructor and Adjunct assistant professor in Aug. 2008. She was a software engineer at Sony Pictures Imageworks Inc
during 2006-2008. She received her Ph.D. and M.S. degrees in Electrical Engineering from the University of California,
Los Angeles in 2008 and 2001, respectively. She received her Bachelor degree in Electrical Engineering from Tsinghua
University in 1999. Her research interests include speech signal processing applied to talking face animations and
image processing and computer vision applied to 3D reconstructions from 2D videos.
[ Home |
Site Map ]
|
|