|
While a substantial amount of work has been done on developing
human face avatars, we have yet to see avatars that are highly realistic
in terms of animation as well as appearance. The goal of this
work is to create speech-enabled avatars of faces that provide realistic
facial motion from text or speech inputs. Such speech-enabled
avatars can significantly enhance user experience in a variety of
applications including mobile messaging, information kiosks, advertising,
news reporting and videoconferencing.
In this project, we present a complete framework for creating speech enabled
2D and 3D avatars from a single image of a person. Our
approach uses a generic facial motion model which represents deformations
of the prototype face during speech. We have developed
an HMM-based facial animation algorithm which takes into
account both lexical stress and coarticulation. This algorithm produces
realistic animations of the prototype facial surface from either
text or speech. The generic facial motion model is transformed to
a novel face geometry using a set of corresponding points between
the generic mesh and the novel face. In the case of a 2D avatar,
a single photograph of the person is used as input. We manually
select a small number of features on the photograph and these are
used to deform the prototype surface. The deformed surface is then
used to animate the photograph. In the case of a 3D avatar, we use
a single stereo image of the person as input. The sparse geometry
of the face is computed from this image and used to warp the
prototype surface to obtain the complete 3D surface of the person’s
face. This surface is etched into a glass cube using sub-surface
laser engraving (SSLE) technology. Synthesized facial animation
videos are then projected onto the etched glass cube. Even though
the etched surface is static, the projection of facial animation onto it
results in a compelling experience for the viewer. We show several
examples of 2D and 3D avatars that are driven by text and speech
inputs. |