Computational modeling of acquisition of vocal imitation using interaction between a virtual infant and a human participant

Research output: Chapter in Book/Report/Conference proceedingMeeting abstract (Book)

Abstract

I present a model that investigates in a bottom-up way of how infants learn to imitate the speech sounds of a language. Infants' aptitude for imitation is not self-evident: it has been shown that in infant-parent interactions, at least 2 to 6 month old infants imitated their parents' vocalizations rarely, whereas the parents imitated the chil- dren's vocalizations significantly more [1]. One problem is that the differences in vo- cal tract sizes and shapes between infants and adults make it impossible for an in- fant to produce the acoustic characteristics defining (for example) a native vowel to an adult. In order to succeed in imitation, infants have to learn to map acoustic fea- tures of their caregivers' speech onto their own articulatory and acoustic domains. Infants also have to somehow explore their articulatory domain and converge onto speech sounds characteristic to their native language. Parents' feedback on infants' babbling has been shown to have an effect on the quality of babbling [2] but details of the learning process are still largely unknown.
Computational speech acquisition simulations offer a way to study potential mechanisms for learning speech: different models can easily be tried out, since virtu- al parent-child interactions can be run extremely quickly compared to real-life situa- tions. Our aim is to find answers to what kind of rules or cognitive abilities are need- ed to make infants' speech learning possible. We demonstrate the feasibility of the found mechanisms by creating a virtual infant - equipped with an articulatory synthe- sizer - that is able to imitate human vowels after a number of interactions with a hu- man speaker. In contrast to using a virtual caregiver (see [3]), the current work uses a real human participant in teaching the virtual infant.
In the experiments, the infant babbles open vowel sounds pseudo-randomly. Based on observed behavior of caregivers and real infants, the participant is set to answer the babbles with real Finnish words with partially matching phonemic content. The infant creates auditory perceptual categories based on its own babbles and as- sociates acoustic features of the heard answers to these categories. The infant's imi- tation ability is tested using a new set of test words spoken by the participant, whose vowels the infant is set to imitate based on the learned associations and stored artic- ulations related to the perceptual categories. Important factors to enable efficient learning include: 1) increasing articulatory accuracy due to babbling experience in certain articulatory regions, 2) a bias to try to either explore novel articulatory regions or repeat already learned perceptual categories, 3) active listening of caregiver's speech and weighting babbling to those perceptual categories that are activated based on learned associations thus far, 4) clustering the infant's learned phonemic categories based on their similarity to associations with the caregiver's speech (which frees resources to explore more interesting categories when it seems likely that more than one of the infant's categories are associated with one caregiver's cat- egory). The pilot experiment shows that the current methods lead to over 90 % imita- tion accuracy for Finnish vowels in around 1000 babble-imitation pairs.
[1] Kokkinaki, T. and Kugiumutzakis, G. (2000). Basic aspects of vocal imitation in infant-parent interaction during the first 6 months, Journal of Reproductive and Infant Psychology, 18(3), 173-187.
[2] Goldstein, M.H., King, A.P. and West, M.J. (2003). Social interaction shapes babbling: testing parallels between birdsong and speech, Proceedings of the Na- tional Academy of Sciences 100, 8030-8035.
[3] Rasilo H., Räsänen O. and Laine U. (2013). Feedback and imitation by a care- giver guides a virtual infant to learn native phonemes and the skill of speech in- version, Speech Communication, 55(9), 909-931.
Original languageEnglish
Title of host publicationAMLaP XX, Edinburgh
Publication statusPublished - 2014
EventArchitectures and Mechanisims for Language Processing Conference, AMLaP XX - Edinburgh, United Kingdom
Duration: 3 Sep 20146 Sep 2014

Conference

ConferenceArchitectures and Mechanisims for Language Processing Conference, AMLaP XX
Country/TerritoryUnited Kingdom
CityEdinburgh
Period3/09/146/09/14

Keywords

  • vocal imitation
  • speech learning
  • speech acquisition
  • computational modeling

Fingerprint

Dive into the research topics of 'Computational modeling of acquisition of vocal imitation using interaction between a virtual infant and a human participant'. Together they form a unique fingerprint.

Cite this