Activities per year
Abstract
Research in Natural Language Processing (NLP) has been marked by substantial developments in the area of deep learning. These approaches automatically learn to represent words, sentences, and documents as dense, continuous representations (i.e. embeddings). So far, studies analyzing deep neural network models and their resulting representations in NLP have primarily focused on semantic and syntactic aspects of language. Social meaning, unfortunately, has hitherto been largely overlooked. Considering this type of meaning can enrich NLP models and offer new possibilities for sociolinguistic research. In this paper, we illustrate the potential of NLP for sociolinguistics and vice versa by focusing on the social meanings of spelling variation (Sebba 2007).
First, we reflect on current NLP developments and why social meaning is important to consider. Second, we draw on methods to analyze societal biases in NLP models (e.g., Caliskan et al., 2017; May et al. 2019) in order to investigate whether popular NLP models encode social meanings associated with different types of spelling variation. We consider both static word embedding models, e.g., the skipgram model (Mikolov et al. 2013), and popular pre-trained models, e.g., BERT (Devlin et al. 2019).
For example, does a skipgram model associate forms with g-dropping (e..g., doin) or lengthening (e.g., coooool) more strongly with a particular gender or social attribute? Or, what does a pre-trained model like BERT predict about an author based on a tweet with or without a specific type of spelling variation?
First, we reflect on current NLP developments and why social meaning is important to consider. Second, we draw on methods to analyze societal biases in NLP models (e.g., Caliskan et al., 2017; May et al. 2019) in order to investigate whether popular NLP models encode social meanings associated with different types of spelling variation. We consider both static word embedding models, e.g., the skipgram model (Mikolov et al. 2013), and popular pre-trained models, e.g., BERT (Devlin et al. 2019).
For example, does a skipgram model associate forms with g-dropping (e..g., doin) or lengthening (e.g., coooool) more strongly with a particular gender or social attribute? Or, what does a pre-trained model like BERT predict about an author based on a tweet with or without a specific type of spelling variation?
Original language | English |
---|---|
Publication status | Published - 2022 |
Event | Sociolinguistics Symposium 24 - Universiteit Gent, Gent, Belgium Duration: 13 Jul 2022 → 16 Jul 2022 https://ss24ghent.be |
Conference
Conference | Sociolinguistics Symposium 24 |
---|---|
Country/Territory | Belgium |
City | Gent |
Period | 13/07/22 → 16/07/22 |
Internet address |
Keywords
- spelling variation
- social meaning of language variation
- NLP
- sociolinguistics
- language attitudes
- English variation
Fingerprint
Dive into the research topics of 'When social meaning meets NLP: How can NLP models inform sociolinguistic research and vice versa?'. Together they form a unique fingerprint.-
Sociolinguistics Symposium 24
Laura Rosseel (Participant)
13 Jul 2022 → 16 Jul 2022Activity: Participating in or organising an event › Participation in conference
-
When social meaning meets NLP: How can NLP models inform sociolinguistic research and vice versa?
Dong Nguyen (Speaker) & Laura Rosseel (Speaker)
13 Jul 2022 → 16 Jul 2022Activity: Talk or presentation › Talk or presentation at a conference