My dissertation research

Disclaimer! Work in progress!

I would love your comments!: email!

My dissertation investigates the processes of sociolinguistic inference which enable people to deal with all this complexity: how listeners use small pronunciation differences to infer aspects of a talker’s social identity. It explores questions like these:

These questions are important to understanding the processes underpinning the social-semiotic function of language: how speech communicates information about who is talking, as well as what they are saying. In my dissertation I explore the above questions by combining experimental methods (which capture listeners’ intuitions about the social meaning of language variation) with phonetic analysis (which allows me to understand how their intuitions relate to the social distribution of language variation in production).

My dissertation research focuses on a pair of vowels in the speech of York, Northern England: the vowel in goose (/u/) and the vowel in goat (/o/). These vowels are interesting because they can vary in a lot of ways. In particular, they can be produced with different tongue positions (more front or back), or with more or less movement as the vowel is produced (either monophthongal or diphthongal).

How do /u/ and /o/ vary in production?

The first question a sociolinguist might ask about these vowels is how these different pronunciation possibilities are distributed in social space. Do older people produce them in a different way to younger people? Do people from different social backgrounds pronounce them in different ways?

To answer these questions, I made recordings of a sample of people of different ages and social backgrounds in York. They were recorded reading a list of words containing the vowels of interest, as well as doing a ‘map task’, which aims to elicit more casual speech. From these recordings, I’ve extracted measurements of the first and second formants, which allow us to quantify differences in pronunciation. The first formant (F1) is related to the height of tongue/openness of the vowel (contrast the high/close ’i’ sound in ‘hid’ with the low/open sound ‘had’. The second formant (F2) is related to how far forward the tongue is, and partially to the protrusion of the lips: contrast the back vowel in fool with the front vowel in feel.

As I mentioned earlier, in order to capture all the possible different pronunciations of /u/ and /o/, we need to take dynamic properties of the vowel into account — we don’t just want to compare a single number representing a point or average value of F1 and F2; rather, we want to analyse how these change over time: the formant trajectories. To do this, I’ve used a statistical technique called Generalised Additive Modeling. This technique allows me to create statistical models which analyze the effect of a range of independent variables on the formant trajectories. The variables I’m interested in are:

These are all social which previous sociolinguistic research has identified as relevant to the way people speak. The three social indices might sound a bit wishy-washy at this stage — they come from an analysis of the interviews I carried out with each talker, and the variables are derived from an exploratory factor analysis — I’ll hopefully write about the details of this some other time. As always with variables reflecting peoples’ social background, we should treat them with care, and recognize that the labels we use for them may not directly reflect the latent variables which underlie our indices.

Comparing a set of statistical models and evaluating how well they fit the data, I can figure out which factors influence variability in the vowels we are interested in. The tables below show the formulae for the best models for each vowel. I've visualized the significant effects below.

Evidence of sound change

Firstly, there's evidence of change in both vowels. We can see that in the plots below, which show the predicted F2 contours for three age groups.

plot of chunk unnamed-chunk-2plot of chunk unnamed-chunk-2

It looks like both vowels are fronting. For /u/, there's fronting both at the onset of the vowel and the offglide, and the trajectory has become shorter – this means that people used to say a word like 'food' with their tongue very far back in the mouth, but now they pronounce it with their tongue further forward, so it sounds more like 'feed'. The same is true for /o/: it looks like the vowel in 'goat' used to be pronounced with a lot of movement of the tongue and lips ('gowt'), but now it's becoming more fronted ('geyt').

Another way of visualizing these changes is to take a point on the trajectory (say, the midpoint), and look at how the average value at that point has changed over time:

plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3

It looks like change in /u/ has been more rapid and regular than change in /o/ – the prediction line in the left-hand panel looks like it fits slightly better than the one on the right. In fact, speaker year of birth explains about 54% of the variance in /u/, and 22% of the variance in /o/.

Dynamic change in /o/

The figures above suggest that there's two kinds of change going on in /u/ and /o/ – the tendency for the vowels to be pronounced further forward in the mouth, and also changes in the vowel dynamics: how much movement the jaw, lips and tongue make. We can quantify these dynamics using the Euclidean distance in F1-F2 space – picking a point at the start and end of the vowel trajectory, and measuring how far away those points are. I've shown this below for /o/: the x-axis shows the average degree of fronting for each speaker, and the the y-axis shows the Euclidean distance.The letters represent the age group of the speaker: Older (born 1935-1960), Middle (born 1961-1980) and younger (born 1981-2000).

plot of chunk unnamed-chunk-4

The above plot demonstrates how change in /o/ involves both fronting and diphthongization: older speakers have very back pronunciations of /o/, and varying degrees of diphthongization. So some older speakers say 'goat' like 'gowt' and others more like 'gort' (n.b. with a British accent!). Younger speakers either have a very fronted diphthong ('gewt' or 'geyt'), or a back monophthong ('gort'). Interestingly, we don't seem to get speakers with the remaining logical possibility, a fronted monophthong ('gert').

plot of chunk unnamed-chunk-5

One reason for these patterns seems to be that diphthongization of /o/ is socially-stratified. The plot below shows how more mobile speakers tend toward more diphthongal vowels – note that the diphthong is the form used in Southern Standard British English. Both groups have undergone fronting, but it's a bit more extreme for the more mobile speakers. There's also been a change in which aspects of /o/ differ across the groups: it looks like the primary distinction used to be the degree of fronting (left-hand panel), but the important difference today is in the degree of diphthongization (right-hand panel).


The analyses presented in this section demonstrate that /u/ and /o/ are undergoing change in York speech. Both vowels are undergoing fronting, and /o/ is subject to social stratification: more mobile speakers tend to pronunce it with a diphthong, and less mobile speakers tend to pronounce it with a monophthong. In the next section, I'll explore the extent to which listeners can use these differences as a social cue. Does producing a back /u/ make a speaker sound older than when they produce a fronted variant? Do people who produce /o/ with a diphthong sound different socially from those who produce a monophthong?

What kind of sociolinguistic inference can people do with /u/ and /o/ variation?

In the previous section, we established that variation in /u/ and /o/ patterns in interesting ways in York – in particular:

Having observed these patterns, it's reasonable to ask whether or not individual speakers are aware of variation in /o/ and /u/ on some level. For example, can they use variation in /o/ to make inferences about the social identity of a speaker? To what extent do those inferences match up with the patterns we observe in production?

To explore this question, I created an experiment where listeners heard a set of words containing the target vowels, and were asked to choose between pairs of images representing three social dimensions: age, socioeconomic status, and urban/rural identity. These dimensions were chosen based on previous work on these vowels. The images are shown below:

Visual stimuli.

In order to capture the full range of variation in /u/ and /o/, I created a set of auditory stimuli that included various combinations of fronting and diphthongization of both vowels. I created the stimuli using a technique called vocoding, which allowed me to take a recording of a speaker from York reading a list of words and manipulate aspects of his vowel pronunciations. You can read more about this technique on my GitHub page: For this experiment, I was interested in manipulating the first and second formant to generate the combinations of fronting and diphthongization represented in the symbols above. Here are some examples of what the formant contours looked like:

plot of chunk unnamed-chunk-6

For the /u/ stimuli, you can see that the stimuli on the bottom row are very similar to the top row – there's a small difference in the first formant right at the start of the vowel, which makes them sound more diphthongal. As we'll see in the results, even a small difference in formant structure like this can make quite a big difference in the social perception of a vowel!

plot of chunk unnamed-chunk-7

For the /o/ stimuli, the difference between the top row and second row should be clear – the formant contours of the top row are much flatter than the ones below. The second and final row show two different ways of fronting the vowel: moving from left to right across the second row, the vowels front primarily at the onset (so they sound more like 'gewt' than 'gowt'). On the bottom row, they front primarily at the offglide (so more like 'geyt').

The experiment had two parts: firstly, listeners saw the visual stimuli and classified them in response to questions such as 'Which character is older' and 'Which character is middle-class', 'Which character is from rural Yorkshire'. This made sure that they interpreted the social meanings conveyed by the stimuli as intended. They then moved on to the main experiment. On each trial, the participant heard a word and chose between two characters which differed on one dimension. For example, they would see the younger, urban, middle-class character alongside an older, urban, middle-class character. They would then hear a word containing one of the target vowels, and choose which person they thought was most likely to speak in that way. The idea was that if the difference between front and back /u/ was available as a cue to age (for example), listeners would be more likely to select an older character than a younger one when hearing back versus front /u/.

Predictions for the social perception of /u/ and /o/

Before looking at the data, let's formulate some clear predictions. As a basic (and probably false) assumption, we might assume that listeners have direct access to the social distribution of variation in /u/ and /o/. If that was the case…

Modeling sociolinguistic perception

The way that the experiment was set up allows listeners' perceptual intuitions to be treated as a binary choice: on a given trial, they hear a speech token, and choose between image a (e.g. a 'working-class' character) and image b (e.g. a 'middle-class' character). We are interested in the way in which the speech samples affected listeners' responses: specifically, the way in which each sample affected the probability of the selection of one of the two images. More formally, we are interested in comparing the conditional probabilities of selections given vowel tokens:

\( P(Social\, category|Vowel\, variant) \)

To get at these values, we also need to take into account the fact that people might be biased one way or the other. During sociolinguistic perception, listeners have a lot of contextual information which might influence their decision – previous experience of the talker's speech; other socially-meaningful cues, etc. It might also be that some people are more generally biased toward selecting a 'working-class' or 'middle-class' image.

To understand how sociolinguistic inference works, I'm going to model listeners' responses using logistic regression. I've fit a model for each vowel and each social dimension (social class, age, and urban/rural). To do this, I used Markov Chain Monte Carlo sampling using Stan: Priors for the models were set using the recommendations in this paper from Andrew Gelman: The plots below show the predictions from each model, expressing the conditional probability of a particular social selection ('Working Class', 'Rural', or 'Old') given a particular pronunciation of /o/ or /u/. The error bars represent the 95% credible interval for each estimate, meaning that at the current state of knowledge we believe there is a 95% chance that the population-level parameter lies within that bound. If the interval of a given prediction crosses .5, it is unlikely that the vowel in question had a reliable effect on listeners' responses. If the intervals for two predictions cross over, it is unlikely that they are reliably different.

Results for /u/

plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8plot of chunk unnamed-chunk-8

Although I predicted that /u/ wouldn't be useful as a cue to social class, it looks like it is – back variants tend to be heard as 'working class', and front variants tend to be heard as 'middle-class'. Diphthongization seems to strengthen this effect. The effects are much less convincing in the case of 'Rural' and 'Older' selections, although it looks like the trend for age selections goes in the right direction – back vowels tend to be mapped to older characters.

Results for /o/

plot of chunk unnamed-chunk-9plot of chunk unnamed-chunk-9plot of chunk unnamed-chunk-9

The social class selections for /o/ go more-or-less as predicted: diphthongs cue the selection of middle-class characters, and monophthongs cue the selection of working-class characters. Interestingly, it looks like fronting generally makes /o/ sound less working-class: for diphthongs, the most back diphthong is heard as working class, in contrast to the front ones; fronting monophthongs makes them less likely to cue a working-class selection. Although the effects are smaller than for social class, /o/ is a reliable cue to the urban/rural dimension: monophthongs sound more 'rural' than diphthongs, and fronting makes monophthongs sound less rural – note that this doesn't appear to apply to diphthongs, unlike in the social class case. There's also evidence that /o/ can be used as cue to talker age: the centralized monophthong cues the selection of younger characters, and diphthongs cue the selection of older characters, especially when they are fronted at the onset.


Let's see how this matches up with the predictions I made earlier:

This was partially supported by the data, but the effects are very weak. Although speaker year of birth explains more than half of the variation in /u/ productions, listeners are only weakly sensitive to this pattern in perception. /o/ can be used as a cue to age, but in a strange way – diphthongization is perceived as 'old', but is actually more likely to occur in the speech of younger speakers.

This prediction was supported by the data – listeners reliably map diphthongal /o/ variants onto 'middle-class' images, consistent with the distribution of diphthongization in production. This effect interacted with fronting – more back diphthongs were mapped to 'working-class' images, and fronting at the vowel onset was heard as particularly 'middle-class'.

This prediction was not supported by the data! Listeners appear to hear fronted '/u/' as more 'middle-class' and back '/u/' as more 'working-class', despite there being no such association in production.

In summary, this analysis has shown that York listeners can use variation in /u/ and /o/ as a social cue, but their intutions very often differ from the actual distribution of that variation in production. For example, although a speaker's degree of /u/ fronting is potentially very informative of their age, listeners aren't good at using that signal in social perception. In some cases, forms which are actually more likely in the speech of younger speakers (such as fronted, diphthongal /o/) are heard as 'older'. In my dissertation, I argue that this is because people are quite bad at tracking the actual distribution of phonetic cues in social space – rather, they make use of more abstract social meanings such as 'broad' and 'posh' to structure their social perceptions of linguistic features.

We can see more evidence for this in the fact that some of the visual stimuli were more likely to be selected than others: reliable effects were found only for the five stimuli below: