Optimized Photorealistic Audiovisual Speech Synthesis Using Active Appearance Modeling

Wesley Mattheyses, Lukas Latacz, Werner Verhelst

Research output: Chapter in Book/Report/Conference proceedingConference paper

9 Citations (Scopus)

Abstract

Active appearance models can represent image information in terms of shape and texture parameters. This paper explains why this makes them highly suitable for data-based 2D audiovisual text-to-speech synthesis. We elaborate on how the differentiation between shape and texture information can be fully exploited to create appropriate unit-selection costs and to enhance the video concatenations. The latter is very important since for the synthetic visual speech a careful balancing between signal smoothness and articulation strength is required. Several optimization strategies to enhance the quality of the synthetic visual speech are proposed. By measuring the properties of each model parameter, an effective normalization of the visual speech database is feasible. In addition, the visual joins can be optimized by a parameter-specific concatenation smoothing. To further enhance the naturalness of the synthetic speech, a spectrum-based smoothing approach is introduced.
Original languageEnglish
Title of host publicationInternational Conference on Auditory-visual Speech Processing 2010, Hakone, Japan
Pages148-153
Number of pages6
Publication statusPublished - 1 Oct 2010
EventFinds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet - Stockholm, Sweden
Duration: 21 Sep 200925 Sep 2009

Conference

ConferenceFinds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet
Country/TerritorySweden
CityStockholm
Period21/09/0925/09/09

Keywords

  • audiovisual speech synthesis
  • AAM modeling

Fingerprint

Dive into the research topics of 'Optimized Photorealistic Audiovisual Speech Synthesis Using Active Appearance Modeling'. Together they form a unique fingerprint.

Cite this