When social meaning meets NLP: How can NLP models inform sociolinguistic research and vice versa?

Dong Nguyen, Laura Rosseel

Onderzoeksoutput: Unpublished abstract


Research in Natural Language Processing (NLP) has been marked by substantial developments in the area of deep learning. These approaches automatically learn to represent words, sentences, and documents as dense, continuous representations (i.e. embeddings). So far, studies analyzing deep neural network models and their resulting representations in NLP have primarily focused on semantic and syntactic aspects of language. Social meaning, unfortunately, has hitherto been largely overlooked. Considering this type of meaning can enrich NLP models and offer new possibilities for sociolinguistic research. In this paper, we illustrate the potential of NLP for sociolinguistics and vice versa by focusing on the social meanings of spelling variation (Sebba 2007).
First, we reflect on current NLP developments and why social meaning is important to consider. Second, we draw on methods to analyze societal biases in NLP models (e.g., Caliskan et al., 2017; May et al. 2019) in order to investigate whether popular NLP models encode social meanings associated with different types of spelling variation. We consider both static word embedding models, e.g., the skipgram model (Mikolov et al. 2013), and popular pre-trained models, e.g., BERT (Devlin et al. 2019).
For example, does a skipgram model associate forms with g-dropping (e..g., doin) or lengthening (e.g., coooool) more strongly with a particular gender or social attribute? Or, what does a pre-trained model like BERT predict about an author based on a tweet with or without a specific type of spelling variation?
Originele taal-2English
StatusPublished - 2022
EvenementSociolinguistics Symposium 24 - Universiteit Gent, Gent, Belgium
Duur: 13 jul 202216 jul 2022


ConferenceSociolinguistics Symposium 24
Internet adres


Duik in de onderzoeksthema's van 'When social meaning meets NLP: How can NLP models inform sociolinguistic research and vice versa?'. Samen vormen ze een unieke vingerafdruk.

Citeer dit