Modelling Language Acquisition through Syntactico-Semantic Pattern Finding on Semantically Annotated Corpora

Onderzoeksoutput: Unpublished abstract

Samenvatting

The constructivist acquisition of language has been elaborately documented by researchers in psycholinguistics and cognitive science (Pine & Lieven 1997; Tomasello 2003). However, the syntactico-semantic pattern finding mechanisms through which children learn constructions and grammatical categories have so far only to a limited extent been translated into computational models, and no faithful operationalisations of these mechanisms exist to date. The research on which we report here aims to fill this void by introducing a mechanistic model of language acquisition through syntactico-semantic pattern finding, which models the co-emergence of constructions and grammatical categories based on semantically annotated corpora. Concretely, we present a methodology for learning computational construction grammars and a network of emergent grammatical categories by generalising over similarities and differences in the form and meaning of linguistic observations alone. The resulting grammars consist of bidirectional form-meaning mappings (i.e. constructions) of varying degrees of abstraction, which can be used for language comprehension (mapping from form to meaning) and production (mapping from meaning to form). We have implemented our methodology in the form of a set of learning operators for Fluid Construction Grammar (Steels 2011; see www.fcg-net.org) and have validated our methodology on the CLEVR benchmark dataset (Johnson 2017). The model achieves 100% accuracy on mapping between CLEVR questions and their meaning representations after having processed 1000 observations, and already achieves 90% accuracy after having processed 500 observations. The results show that our approach allows for online, incremental, data-efficient, transparent, and effective learning. The research that we present here has both theoretical and practical implications. From a theoretical perspective, we provide computational evidence for the cognitive plausibility of usage-based constructionist theories of language acquisition by means of a precise mechanistic model of how a fully operational construction grammar consisting of constructions of varying degrees of abstraction can be bootstrapped from raw observations. From a practical point of view, the techniques that we introduce here pave the way for learning computationally tractable, usage-based construction grammars that can be used for both language comprehension and production. Such systems are valuable for a large range of application domains, including intelligent conversational agents, the semantic analysis of discourse, intelligent tutoring systems and question answering systems.
Originele taal-2English
StatusPublished - 17 jun 2022
EvenementComputational Linguistics in The Netherlands 32 - Willem II Stadium, Tilburg, Netherlands
Duur: 17 jun 202217 jun 2022
Congresnummer: 32
https://clin2022.uvt.nl

Conference

ConferenceComputational Linguistics in The Netherlands 32
Verkorte titelCLIN
Land/RegioNetherlands
StadTilburg
Periode17/06/2217/06/22
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Modelling Language Acquisition through Syntactico-Semantic Pattern Finding on Semantically Annotated Corpora'. Samen vormen ze een unieke vingerafdruk.

Citeer dit