When small data meets big data: A study on socially meaningful spelling variation in English

Laura Rosseel, Dong Nguyen

Onderzoeksoutput: Unpublished abstract


In social media, spelling variation is abundant. Crucially, many non-conventional spellings are not misspellings—by deviating from conventional spelling norms, writers create social meaning (Sebba, 2007). For example, a certain spelling may be used to evoke intimacy or to index a certain region. Various studies have analyzed the patterns and functions of spelling variation, e.g., in Twitter (Tatman, 2015). Yet, there has been little quantitative research on the social meanings of spelling variants. This study aims to contribute to tackling this descriptive lacuna in sociolinguistic research on language variation. We do so by comparing the social meanings of spelling variants, elicited through human experiments, to data-driven meaning representations, automatically learned from large corpora. As such, our study supplements its descriptive research aim with a methodological one: to what extent can traditional sociolinguistic ‘small data’ and recent NLP based ‘big data’ approaches complement each other?
We focus on spelling variation on the popular online platform Twitter. We look at two types of spelling variation phenomena in English: (1) spelling variation representing phonetic variation (e.g. workin vs. working) and (2) spelling variation restricted to the orthographic level (e.g. swapping of characters).
First, the social meaning of the linguistic variants is measured in a written version of the speaker evaluation paradigm (cf. Leigh 2018). Using crowd sourcing, we collect various social traits that are potentially associated with the social meanings of the targeted linguistic variation. Second, we compare our measurements of social meanings with word embeddings, i.e. automatically learned mappings from words to high-dimensional vectors based on co-occurrences in the corpora.
Our study brings novel insights into the social meaning of spelling variants and also draws attention to limitations and opportunities of data-driven meaning representations for sociolinguistic research on language variation.
1. Leigh, D. (2018). Expecting a Performance: Listener expectations of social meaning in social media. Paper presented at NWAV43, New York, 20 October 2018.
2. Tatman, R. (2015). #go awn: Sociophonetic Variation in Variant Spellings on Twitter. Working Papers of the Linguistics Circle of the University of Victoria 25(2).
3. Sebba, M. (2007). Spelling and Society: The Culture and Politics of Orthography around the World. Cambridge: CUP.
Originele taal-2English
StatusPublished - 2022
EvenementSociolinguistics Symposium 24 - Universiteit Gent, Gent, Belgium
Duur: 13 jul 202216 jul 2022


ConferenceSociolinguistics Symposium 24
Internet adres


Duik in de onderzoeksthema's van 'When small data meets big data: A study on socially meaningful spelling variation in English'. Samen vormen ze een unieke vingerafdruk.

Citeer dit