7 Who’s talking how often?
So far, we have counted words for characters. Now we will turn to utterances, and their properties.
First, we will use the function utteranceStatistics()
to extract quantitative information about utterances:
utteranceStatistics(rksp.0)
## corpus drama character utteranceBegin utteranceLength
## 1 test rksp.0 der_prinz 426 4.888626e-03
## 2 test rksp.0 der_kammerdiener 1149 7.884881e-05
## 3 test rksp.0 der_prinz 1178 2.996255e-03
## 4 test rksp.0 der_kammerdiener 1526 6.307905e-04
## 5 test rksp.0 der_prinz 1655 2.759708e-04
## 6 test rksp.0 der_kammerdiener 1697 1.576976e-04
## 7 test rksp.0 der_prinz 1739 9.856101e-04
## 8 test rksp.0 der_kammerdiener 1857 3.153952e-04
## 9 test rksp.0 der_prinz 1919 3.272226e-03
## 10 test rksp.0 der_kammerdiener 2299 3.548196e-04
This creates a table that is very long, which is why we only show the first 10 rows here. The table contains one row for each utterance, and information about the speaker of the utterance, its length (measured in tokens) and its starting position (character position). We can now inspect the variance in utterance length:
<- utteranceStatistics(rksp.0) %>%
ustat characterNames(rksp.0) # use names instead of character ids (see above)
par(mar=c(9,4,2,2)) # increase margin
boxplot(utteranceLength ~ character, # what do we want to correlate
data=ustat, # the data set to use
main = dramaNames(rksp.0), # the main title of the plot
las = 2 # rotate axis labels
)
This uses the regular boxplot()
function, enter ?boxplot
for documentation. utteranceLength ~ character
is called a formula in R and (in this case) expresses that we want to look at the column utteranceLength
, grouped by the column character
. The boxplot is a useful way to grasp the dispersion of a set of values (in this case: the lengths of all utterances by a character).
7.1 When are characters talking?
While the above displays the length of utterances, we can also display the position of utterances (remember the column utteranceBegin
?). The following snippet visualizes when characters are talking, this time for Lessings Miss Sara Sampson:
par(mar=c(2,7,2,2))
utteranceStatistics(rjmw.0) %>%
characterNames(rjmw.0) %>% # character names instead of ids
plot(main=dramaNames(rjmw.0)) # calling plot.QDUtteranceStatistics()
Each dot in this plot represents one utterance, the x-axis is measured in character positions. This is not really intuitive, but the flow from left to right represents the flow of the text. More technically, we again apply the function characterNames()
to display character names instead of character ids. This is the same function as above, just applied to a different table. It can be applied to any table of the type QDHasCharacter
. Secondly, the call to the function plot()
gets rerouted to the function plot.QDUtteranceStatistics()
, because the object we supply as argument is of the type QDUtteranceStatistics
. Information about this function can be retrieved by entering ?plot.QDUtteranceStatistics
.
7.2 Adding act boundaries
Now it would be useful to include information on act/scene boundaries in this plot. This can be done by supplying the original drama object as a second argument to the plot()
function:
par(mar=c(2,8,2,2))
utteranceStatistics(rksp.0) %>%
characterNames(rksp.0) %>%
plot(rksp.0, # adding the `QDDrama` object here creates the act boundaries.
main=dramaNames(rksp.0))
Please note that the information contained in this plot is very similar to the information in configuration matrices.
7.3 Collection analysis
In Version 3.0, the function utteranceStatistics()
expects to be fed a QDDrama object that contains a single play (it will stop otherwise). It’s still possible to run it over several plays at once, if the plays are in different objects but in a single list.
We again load the Sturm und Drang plays as before, but we will use the function split()
to separate the plays. This results in a list of QDDrama
objects.
<- c("qd:11f81.0", "qd:11g1d.0", "qd:11g9w.0",
sturm_und_drang.ids "qd:11hdv.0", "qd:nds0.0", "qd:r12k.0",
"qd:r12v.0", "qd:r134.0", "qd:r13g.0",
"qd:rfxf.0", "qd:rhtz.0", "qd:rhzq.0",
"qd:rj22.0", "qd:tx4z.0", "qd:tz39.0",
"qd:tzgk.0", "qd:v0fv.0", "qd:wznj.0",
"qd:tx4z.0", "qd:rfxf.0")
<- loadDrama(sturm_und_drang.ids)
sturm_und_drang.plays
# separate the plays
<- split(sturm_und_drang.plays) sturm_und_drang.plays
This list can of course be plotted again, but it will result in a number of individual plots.
par(mfrow=c(length(sturm_und_drang.plays),1), mar=c(2,15,2,2))
lapply(sturm_und_drang.plays,
function(x) {
utteranceStatistics(x) %>%
characterNames(x) %>%
plot(x, main=dramaNames(x))
} )