Computational construction grammar and procedural semantics for visual dialogue

Onderzoeksoutput: Unpublished abstract


The task of visual dialogue consists in holding a meaningful and coherent conversation spanning multiple turns, discussing an image or other visual content. The ability of holding such a conversation relies on a multiplicity of underlying cognitive capabilities, including (i) perception (for making sense of visual data), (ii) language processing and reasoning (for composing and understanding non-trivial questions and answers), and (iii) memory (for keeping track of what was previously mentioned in the conversation). Current conversational agents are typically good at perception and either numerical or symbolic reasoning (Mascharka et al. 2018; Nevens et al. 2019), but fall short when they need to take part in multi-turn conversations, where the information conveyed during each dialogue turn cannot be interpreted independently from earlier turns (Kottur et al. 2018). Here, we aim to overcome this problem by extending the language processing system with mechanisms for keeping track of the dialogue history, and for retrieving information from it.

The methodology that we present builds further on the computational construction grammar approach to visual question answering that was introduced by Nevens et al. (2019). In this work, a computational construction grammar was used to map between natural language questions and a representation of their meaning. The meaning representation was expressed in terms of procedural semantics, which means that the meaning of the questions is expressed in the form of a program consisting of the mental operations that need to be executed to find the answer (Winograd 1971; Johnson-Laird 1977). We have extended this semantic representation with operations that can read from and write to a conversation memory. This way, agents can keep track of information that was conveyed earlier in the dialogue and can consult this information later in the conversation.

We have evaluated our novel methodology on the CLEVR-dialog dataset (Kottur et al. 2018), where it achieves a near-perfect accuracy of 99.99%. These results are a major improvement over the state-of-the-art, which was previously at 68% (Kottur et al. 2018).

In sum, we have shown for the first time how computational construction grammar and procedural semantics can be applied to visual dialogue, where utterances are not isolated, but embedded in a conversational context.
Originele taal-2English
StatusUnpublished - aug 2021
Evenement11th International Conference on Construction Grammar - Antwerp, Belgium
Duur: 18 aug 202120 aug 2021


Conference11th International Conference on Construction Grammar
Verkorte titelICCG11
Internet adres


Duik in de onderzoeksthema's van 'Computational construction grammar and procedural semantics for visual dialogue'. Samen vormen ze een unieke vingerafdruk.

Citeer dit