|
|
Multifactor
Analysis and Synthesis of Dynamic Facial Expressions Shira Mitchell, Ward Melville High School, East Setauket, NY, Dr. Dimitris Samaras, Yang Wang, Mohit Gupta, Computer Science Department, Stony Brook University, Dr. Ahmed Elgammal, Chan-Su Lee, Computer Science Department, Rutgers University. | |||
Synthesis
of facial expression is central to facial animation. Models have been obtained
that synthesize limb motion based on the individual's style and the action performed.
Analysis of static facial images has produced models capable of synthesizing images
based on a number of factors including illumination, viewpoint, person, and expression.
Previous work has been successful in constructing a model that accommodates various
individuals performing the same dynamic expression. The goal of our project was
to develop a model that synthesizes facial expression motion based on an individual's
style for a variety of expressions. Facial expressions are more complex than limb
motion in which a small amount of markers is sufficient to acquire motion data.
We capture high-speed, high-accuracy data of moving faces using a phase-shift
based structured light ranging technique. In order to analyze such data, correspondences
must be established among different frames of the same person as well as among
different people. Our method uses a multi-resolution 3D deformable mesh as a face
model that tracks global motion on a coarse level to recover changes in regions
with intuitive parameters, captures the local deformations using implicit shape
representation and Free Form Deformation, and preserves the correspondences at
critical features such as the corners of the mouth and eyes. Human facial expressions
are the result of multiple elements, including the expression performed and a
personal style that captures the distinctive pattern of movement for a particular
individual. Our algorithm analyzes motion data spanning multiple subjects performing
different expressions. Multi-linear subspace analysis offers a technique called
n-mode singular value decomposition to disentangle motion data, represented by
a tensor, into factors such as person, expression, and 3D point position through
time. This analysis produces a generative model that can synthesize new motions
in the distinctive style of new individuals. We are processing 3D data for two
expressions of the mouth region (blowing and smiling) for five different people
that we plan to organize into a data tensor, which we will decompose using our
4-mode SVD code. A sixth person will also perform both expressions and we will
use one of his/her expression sequence to synthesize the other in his/her own
style. Then we will compare our synthesized results to the actual results. We
suspect that a potential problem will arise because our expression sequences for
different people have no correspondence through time. In order to achieve temporal
alignment, we plan to use a standardized embedding in a one-dimensional unit circle
manifold in two-dimensional space. This work was supported by the Simons Foundation. | ||||
Back to Home page