Multifactor Analysis and Synthesis of Dynamic Facial Expressions
Shira Mitchell, Ward Melville High School, East Setauket, NY, Dr. Dimitris Samaras, Yang Wang, Mohit Gupta, Computer Science Department, Stony Brook University, Dr. Ahmed Elgammal, Chan-Su Lee, Computer Science Department, Rutgers University.

Synthesis of facial expression is central to facial animation. Models have been obtained that synthesize limb motion based on the individual's style and the action performed. Analysis of static facial images has produced models capable of synthesizing images based on a number of factors including illumination, viewpoint, person, and expression. Previous work has been successful in constructing a model that accommodates various individuals performing the same dynamic expression. The goal of our project was to develop a model that synthesizes facial expression motion based on an individual's style for a variety of expressions. Facial expressions are more complex than limb motion in which a small amount of markers is sufficient to acquire motion data. We capture high-speed, high-accuracy data of moving faces using a phase-shift based structured light ranging technique. In order to analyze such data, correspondences must be established among different frames of the same person as well as among different people. Our method uses a multi-resolution 3D deformable mesh as a face model that tracks global motion on a coarse level to recover changes in regions with intuitive parameters, captures the local deformations using implicit shape representation and Free Form Deformation, and preserves the correspondences at critical features such as the corners of the mouth and eyes. Human facial expressions are the result of multiple elements, including the expression performed and a personal style that captures the distinctive pattern of movement for a particular individual. Our algorithm analyzes motion data spanning multiple subjects performing different expressions. Multi-linear subspace analysis offers a technique called n-mode singular value decomposition to disentangle motion data, represented by a tensor, into factors such as person, expression, and 3D point position through time. This analysis produces a generative model that can synthesize new motions in the distinctive style of new individuals. We are processing 3D data for two expressions of the mouth region (blowing and smiling) for five different people that we plan to organize into a data tensor, which we will decompose using our 4-mode SVD code. A sixth person will also perform both expressions and we will use one of his/her expression sequence to synthesize the other in his/her own style. Then we will compare our synthesized results to the actual results. We suspect that a potential problem will arise because our expression sequences for different people have no correspondence through time. In order to achieve temporal alignment, we plan to use a standardized embedding in a one-dimensional unit circle manifold in two-dimensional space. This work was supported by the Simons Foundation.

Back to Home page