Activiteiten per jaar
Samenvatting
Research in Natural Language Processing (NLP) has been marked by substantial developments in the area of deep learning. These approaches automatically learn to represent words, sentences, and documents as dense, continuous representations (i.e. embeddings). So far, studies analyzing deep neural network models and their resulting representations in NLP have primarily focused on semantic and syntactic aspects of language. Social meaning, unfortunately, has hitherto been largely overlooked. Considering this type of meaning can enrich NLP models and offer new possibilities for sociolinguistic research. In this paper, we illustrate the potential of NLP for sociolinguistics and vice versa by focusing on the social meanings of spelling variation (Sebba 2007).
First, we reflect on current NLP developments and why social meaning is important to consider. Second, we draw on methods to analyze societal biases in NLP models (e.g., Caliskan et al., 2017; May et al. 2019) in order to investigate whether popular NLP models encode social meanings associated with different types of spelling variation. We consider both static word embedding models, e.g., the skipgram model (Mikolov et al. 2013), and popular pre-trained models, e.g., BERT (Devlin et al. 2019).
For example, does a skipgram model associate forms with g-dropping (e..g., doin) or lengthening (e.g., coooool) more strongly with a particular gender or social attribute? Or, what does a pre-trained model like BERT predict about an author based on a tweet with or without a specific type of spelling variation?
First, we reflect on current NLP developments and why social meaning is important to consider. Second, we draw on methods to analyze societal biases in NLP models (e.g., Caliskan et al., 2017; May et al. 2019) in order to investigate whether popular NLP models encode social meanings associated with different types of spelling variation. We consider both static word embedding models, e.g., the skipgram model (Mikolov et al. 2013), and popular pre-trained models, e.g., BERT (Devlin et al. 2019).
For example, does a skipgram model associate forms with g-dropping (e..g., doin) or lengthening (e.g., coooool) more strongly with a particular gender or social attribute? Or, what does a pre-trained model like BERT predict about an author based on a tweet with or without a specific type of spelling variation?
Originele taal-2 | English |
---|---|
Status | Published - 2022 |
Evenement | Sociolinguistics Symposium 24 - Universiteit Gent, Gent, Belgium Duur: 13 jul 2022 → 16 jul 2022 https://ss24ghent.be |
Conference
Conference | Sociolinguistics Symposium 24 |
---|---|
Land/Regio | Belgium |
Stad | Gent |
Periode | 13/07/22 → 16/07/22 |
Internet adres |
Vingerafdruk
Duik in de onderzoeksthema's van 'When social meaning meets NLP: How can NLP models inform sociolinguistic research and vice versa?'. Samen vormen ze een unieke vingerafdruk.-
When social meaning meets NLP: How can NLP models inform sociolinguistic research and vice versa?
Dong Nguyen (Speaker) & Laura Rosseel (Speaker)
13 jul 2022 → 16 jul 2022Activiteit: Talk or presentation at a conference
-
Sociolinguistics Symposium 24
Laura Rosseel (Participant)
13 jul 2022 → 16 jul 2022Activiteit: Participation in conference