Computational construction grammar and procedural semantics for visual dialogue

Research output: Chapter in Book/Report/Conference proceedingMeeting abstract (Book)

Abstract

The task of visual dialogue consists in holding a meaningful and coherent conversation spanning multiple turns, discussing an image or other visual content. The ability of holding such a conversation relies on a multiplicity of underlying cognitive capabilities, including (i) perception (for making sense of visual data), (ii) language processing and reasoning (for composing and understanding non-trivial questions and answers), and (iii) memory (for keeping track of what was previously mentioned in the conversation). Current conversational agents are typically good at perception and either numerical or symbolic reasoning (Mascharka et al. 2018; Nevens et al. 2019), but fall short when they need to take part in multi-turn conversations, where the information conveyed during each dialogue turn cannot be interpreted independently from earlier turns (Kottur et al. 2018). Here, we aim to overcome this problem by extending the language processing system with mechanisms for keeping track of the dialogue history, and for retrieving information from it.

The methodology that we present builds further on the computational construction grammar approach to visual question answering that was introduced by Nevens et al. (2019). In this work, a computational construction grammar was used to map between natural language questions and a representation of their meaning. The meaning representation was expressed in terms of procedural semantics, which means that the meaning of the questions is expressed in the form of a program consisting of the mental operations that need to be executed to find the answer (Winograd 1971; Johnson-Laird 1977). We have extended this semantic representation with operations that can read from and write to a conversation memory. This way, agents can keep track of information that was conveyed earlier in the dialogue and can consult this information later in the conversation.

We have evaluated our novel methodology on the CLEVR-dialog dataset (Kottur et al. 2018), where it achieves a near-perfect accuracy of 99.99%. These results are a major improvement over the state-of-the-art, which was previously at 68% (Kottur et al. 2018).

In sum, we have shown for the first time how computational construction grammar and procedural semantics can be applied to visual dialogue, where utterances are not isolated, but embedded in a conversational context.
Original languageEnglish
Title of host publication11th International Conference on Construction Grammar
Publication statusAccepted/In press - Aug 2021
Event11th International Conference on Construction Grammar - Antwerp, Belgium
Duration: 18 Aug 202120 Aug 2021
https://www.uantwerpen.be/en/conferences/construction-grammars/

Conference

Conference11th International Conference on Construction Grammar
Abbreviated titleICCG11
CountryBelgium
CityAntwerp
Period18/08/2120/08/21
Internet address

Fingerprint

Dive into the research topics of 'Computational construction grammar and procedural semantics for visual dialogue'. Together they form a unique fingerprint.

Cite this