QuaDramA Tutorial

Heidelberg University

Nils Reiter, nils.reiter@ims.uni-stuttgart.de

18.06.2018

Outline

  • Installation and Setup
  • R Basics and RStudio
  • The R package DramaAnalysis

Installation and Setup

Needed for this tutorial

  • R
  • RStudio
  • Once RStudio is running:
    type install.packages("devtools") in the Console
> install.packages("devtools")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/devtools_1.13.5.tgz'
Content type 'application/x-gzip' length 702722 bytes (686 KB)
==================================================
downloaded 686 KB


The downloaded binary packages are in
    /var/folders/mj/_qhr36wj3gdczq38qy96hxf00000gp/T//RtmpkTLQ16/downloaded_packages
> 

Installation of DramaAnalysis package

Type devtools::install_github("quadrama/DramaAnalysis") in the console

> devtools::install_github("quadrama/DramaAnalysis")
Downloading GitHub repo quadrama/DramaAnalysis@master
from URL https://api.github.com/repos/quadrama/DramaAnalysis/zipball/master
Installing DramaAnalysis
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/rmarkdown_1.10.tgz'
Content type 'application/x-gzip' length 2804704 bytes (2.7 MB)
==================================================
downloaded 2.7 MB
... 

R Basics and RStudio

R Basics

  • R is a programming language
    • Mostly used for statistical data analysis (“data science”)
    • First version: 1993
    • Current stable release: 3.5
    • Website
  • Three important concepts we need to talk about
    • Objects and types
    • Variables
    • Functions

R Basics – Objects and Types

  • Objects live in the computer memory (or on disk)
  • Objects represent the things we want to analyse (e.g., dramatic texts, words, or numbers)
  • An object has one or more types
  • The type of an object determines what we can do with it
    • E.g., a knife allows other operations than a fork
  • Types: Numbers, strings, lists, tables, …
    • Numbers allow arithmetic operations
      • E.g., summation: sum(3,5) (evaluates to 8, equivalent to 3+5)
    • Strings allow character-based operations
      • E.g., conversion to lower case: tolower("ABC") (evaluates to "abc")
  • “evalutes to” -> we will come back to this

R Basics – Objects and Types

Type Example (in R syntax) Description
Numeric 5 A numeric value
Character "Heidelberg" A sequence of characters (note the double quotes!)
Logical TRUE/FALSE A truth value
Vector c(5,4,1) A sequence of objects of the same type
List list(5,"Hd",TRUE) A sequence of objects
Matrix A table of objects of the same type
Data frame A table of objects

R Basics – Objects and Types

In R, everything is a vector!

  • Entering 5 creates a numeric vector of length 1
  • Entering "Bla" creates a character vector of length 1

(In this way, R is different from other programming languages)

## [1] 5
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50

R Basics – Variables

  • We usually do not interact with the objects directly
    • Because they are not known in advance (but loaded from files)
  • Variables
    • A way to name objects
    • Used as a placeholder for objects
    • The actual operation takes place on the objects (R takes care of this)
  • Creating a variable a: a <- 3 (think of this as an arrow)
> a <- 3
> b <- 5
> a + b
[1] 8
>

R Basics – Functions

  • “Mini programs”: A collection of instructions that you can use as a single instruction
  • Input: Functions take arguments as input
  • Output: Functions return an object (that stores the result of the instructions)
  • Functions have a name (typically lower case) and can be reognized by the round parentheses
    function(argument1, argument2, argument3, ...)
  • The return value of a function can be stored in a variable
    variable <- function(arg1, arg2, ...)
  • Some functions not only return a value, but also do something (e.g., display a plot)

R Basics – Functions

## [1] 6
## [1] 6

What’s the value of s now?

RStudio

  • An integrated development environment (IDE) for R
  • Capable workbench for data analysis

RStudio – 4 Panes (Screenshot)

  • Console: Where you enter R code and get the result immediately
  • Environment: Shows the objects currently in memory
  • Plots: Shows plots
  • Editor/Code: Allows editing R code and inspecting tables

(some have multiple tabs)

The R Package DramaAnalysis

Background

  • Dramatic texts are initially stored as TEI/XML files
  • Language processing (e.g., detection of parts of speech) takes place in a UIMA pipeline
  • The output of the pipeline are several CSV files for each play
    • Meta data
    • Character data
    • Utterances
    • Segments (scenes and acts)
    • Stage directions (work in progress)
  • CSV files are then analyzed in R

Setup

Loading the package

## DramaAnalysis 2.1.0

Install preprocessed data

  1. Find out the directory in which data needs to be stored getOption("qd.datadir")
  2. Open this directory in your file browser (e.g., Explorer, Finder)
  3. Download this file: github.com/quadrama/data_gdc/archive/master.zip and unpack it into the previously opened folder
  4. Verify: There should be a directory gdc with a sub folder csv. The csv folder contains a lot of files.

Verification

##    [1] "gdc:11d11.0" "gdc:11d2m.0" "gdc:11f78.0" "gdc:11f81.0"
##    [5] "gdc:11f9k.0" "gdc:11fzp.0" "gdc:11g1d.0" "gdc:11g3h.0"
##    [9] "gdc:11g5f.0" "gdc:11g5g.0" "gdc:11g5q.0" "gdc:11g9w.0"
##   [13] "gdc:11gn9.0" "gdc:11gtp.0" "gdc:11h10.0" "gdc:11h60.0"
##   [17] "gdc:11hb1.0" "gdc:11hb5.0" "gdc:11hdv.0" "gdc:12675.0"
##   [21] "gdc:jkjb.0"  "gdc:jkjf.0"  "gdc:jn5z.0"  "gdc:jn65.0" 
##   [25] "gdc:jn6f.0"  "gdc:jn6k.0"  "gdc:jn6r.0"  "gdc:jn73.0" 
##   [29] "gdc:k4xx.0"  "gdc:k59h.0"  "gdc:k77x.0"  "gdc:k7f7.0" 
##   [33] "gdc:k8cn.0"  "gdc:k936.0"  "gdc:k93c.0"  "gdc:k93g.0" 
##   [37] "gdc:kd4c.0"  "gdc:kd4g.0"  "gdc:kd4r.0"  "gdc:kd4v.0" 
##   [41] "gdc:kd4z.0"  "gdc:kjfc.0"  "gdc:kjfg.0"  "gdc:kjfz.0" 
##   [45] "gdc:kjg2.0"  "gdc:kjgc.0"  "gdc:kjgj.0"  "gdc:kjgv.0" 
##   [49] "gdc:kmdh.0"  "gdc:kmdk.0"  "gdc:kmdp.0"  "gdc:kmdt.0" 
##   [53] "gdc:kmdw.0"  "gdc:kmq3.0"  "gdc:kmqb.0"  "gdc:knb2.0" 
##   [57] "gdc:kp19.0"  "gdc:kpkv.0"  "gdc:kpsk.0"  "gdc:kpsp.0" 
##   [61] "gdc:kq0s.0"  "gdc:kq0w.0"  "gdc:kq6g.0"  "gdc:kq6k.0" 
##   [65] "gdc:kq6v.0"  "gdc:kqwr.0"  "gdc:krdh.0"  "gdc:krf8.0" 
##   [69] "gdc:kscn.0"  "gdc:ksd1.0"  "gdc:kssc.0"  "gdc:ktrm.0" 
##   [73] "gdc:kw8v.0"  "gdc:m08h.0"  "gdc:m0br.0"  "gdc:m9xz.0" 
##   [77] "gdc:mbmn.0"  "gdc:mgfk.0"  "gdc:mgfp.0"  "gdc:mhq8.0" 
##   [81] "gdc:mjbp.0"  "gdc:mk3n.0"  "gdc:mr3k.0"  "gdc:msdg.0" 
##   [85] "gdc:msqv.0"  "gdc:mv7f.0"  "gdc:mvqk.0"  "gdc:mvqp.0" 
##   [89] "gdc:mvqs.0"  "gdc:mvr3.0"  "gdc:mvrf.0"  "gdc:n0xc.0" 
##   [93] "gdc:n23j.0"  "gdc:n2k7.0"  "gdc:n2kq.0"  "gdc:n3cn.0" 
##   [97] "gdc:n3dj.0"  "gdc:n750.0"  "gdc:n79j.0"  "gdc:n7rr.0" 
##  [101] "gdc:ndmr.0"  "gdc:nds0.0"  "gdc:ndtw.0"  "gdc:nfhm.0" 
##  [105] "gdc:nkbv.0"  "gdc:nkdm.0"  "gdc:nkdq.0"  "gdc:nkdt.0" 
##  [109] "gdc:nks0.0"  "gdc:nkww.0"  "gdc:nm24.0"  "gdc:nm27.0" 
##  [113] "gdc:nm2b.0"  "gdc:nmfg.0"  "gdc:nmfn.0"  "gdc:nmfv.0" 
##  [117] "gdc:nmg0.0"  "gdc:nmg5.0"  "gdc:nns8.0"  "gdc:npm2.0" 
##  [121] "gdc:npsg.0"  "gdc:ns3r.0"  "gdc:ns9c.0"  "gdc:nsv5.0" 
##  [125] "gdc:nttc.0"  "gdc:nv2j.0"  "gdc:nvk1.0"  "gdc:p5bz.0" 
##  [129] "gdc:p5c7.0"  "gdc:p5c8.0"  "gdc:p5cg.0"  "gdc:pb0f.0" 
##  [133] "gdc:pb4c.0"  "gdc:pb6v.0"  "gdc:pgp7.0"  "gdc:pgph.0" 
##  [137] "gdc:pgpm.0"  "gdc:pj2h.0"  "gdc:pjw6.0"  "gdc:pk02.0" 
##  [141] "gdc:pk6x.0"  "gdc:pkh4.0"  "gdc:pkhc.0"  "gdc:pkhg.0" 
##  [145] "gdc:pkhm.0"  "gdc:pkhv.0"  "gdc:pkj1.0"  "gdc:pkj2.0" 
##  [149] "gdc:pkjc.0"  "gdc:pkng.0"  "gdc:pksn.0"  "gdc:pkx0.0" 
##  [153] "gdc:pm1c.0"  "gdc:pm7m.0"  "gdc:pm99.0"  "gdc:pmfw.0" 
##  [157] "gdc:pmgg.0"  "gdc:pmj9.0"  "gdc:pmp3.0"  "gdc:pmtc.0" 
##  [161] "gdc:pn22.0"  "gdc:pn2n.0"  "gdc:psxj.0"  "gdc:ptkw.0" 
##  [165] "gdc:pvxf.0"  "gdc:pvxk.0"  "gdc:pzj2.0"  "gdc:q41n.0" 
##  [169] "gdc:q4j5.0"  "gdc:q5q1.0"  "gdc:qd27.0"  "gdc:qd2g.0" 
##  [173] "gdc:qd3h.0"  "gdc:qd46.0"  "gdc:qd6w.0"  "gdc:qdcq.0" 
##  [177] "gdc:qdhd.0"  "gdc:qdj2.0"  "gdc:qdjz.0"  "gdc:qdk1.0" 
##  [181] "gdc:qdks.0"  "gdc:qdm5.0"  "gdc:qdm9.0"  "gdc:qdmd.0" 
##  [185] "gdc:qdmw.0"  "gdc:qdnz.0"  "gdc:qdqn.0"  "gdc:qfxf.0" 
##  [189] "gdc:qfxj.0"  "gdc:qggj.0"  "gdc:qh0j.0"  "gdc:qh44.0" 
##  [193] "gdc:qhpn.0"  "gdc:qhwd.0"  "gdc:qkmh.0"  "gdc:qkmq.0" 
##  [197] "gdc:qkmw.0"  "gdc:qkmx.0"  "gdc:qkn4.0"  "gdc:qknj.0" 
##  [201] "gdc:qkp2.0"  "gdc:qkp7.0"  "gdc:qkps.0"  "gdc:qkq1.0" 
##  [205] "gdc:qn26.0"  "gdc:qn29.0"  "gdc:qn2w.0"  "gdc:qn30.0" 
##  [209] "gdc:qx5q.0"  "gdc:qxh9.0"  "gdc:r0gb.0"  "gdc:r0hz.0" 
##  [213] "gdc:r0jz.0"  "gdc:r0n2.0"  "gdc:r0px.0"  "gdc:r0sb.0" 
##  [217] "gdc:r10b.0"  "gdc:r11m.0"  "gdc:r121.0"  "gdc:r12k.0" 
##  [221] "gdc:r12v.0"  "gdc:r134.0"  "gdc:r13g.0"  "gdc:r1hd.0" 
##  [225] "gdc:r1zm.0"  "gdc:r22j.0"  "gdc:r23w.0"  "gdc:r2m4.0" 
##  [229] "gdc:r2m9.0"  "gdc:r2mf.0"  "gdc:r2mm.0"  "gdc:r2mn.0" 
##  [233] "gdc:r2mv.0"  "gdc:r2r7.0"  "gdc:r2rb.0"  "gdc:rdg3.0" 
##  [237] "gdc:rdgt.0"  "gdc:rdhd.0"  "gdc:rfbt.0"  "gdc:rfbx.0" 
##  [241] "gdc:rfc7.0"  "gdc:rfcn.0"  "gdc:rfct.0"  "gdc:rfcv.0" 
##  [245] "gdc:rfd2.0"  "gdc:rfd8.0"  "gdc:rfm5.0"  "gdc:rfmj.0" 
##  [249] "gdc:rfn5.0"  "gdc:rfx8.0"  "gdc:rfxf.0"  "gdc:rfxg.0" 
##  [253] "gdc:rhjs.0"  "gdc:rhnx.0"  "gdc:rhtz.0"  "gdc:rhzq.0" 
##  [257] "gdc:rj22.0"  "gdc:rjdc.0"  "gdc:rjg0.0"  "gdc:rjh9.0" 
##  [261] "gdc:rjmw.0"  "gdc:rjqs.0"  "gdc:rjwv.0"  "gdc:rkpt.0" 
##  [265] "gdc:rkq6.0"  "gdc:rksp.0"  "gdc:rktc.0"  "gdc:rkzk.0" 
##  [269] "gdc:rkzp.0"  "gdc:s6kj.0"  "gdc:s6kn.0"  "gdc:s6m2.0" 
##  [273] "gdc:s6m6.0"  "gdc:s8br.0"  "gdc:s8bv.0"  "gdc:s9j7.0" 
##  [277] "gdc:sc8m.0"  "gdc:sc8q.0"  "gdc:sj68.0"  "gdc:sj6f.0" 
##  [281] "gdc:sj6g.0"  "gdc:sj6r.0"  "gdc:sj6v.0"  "gdc:sjhr.0" 
##  [285] "gdc:sjpt.0"  "gdc:spd6.0"  "gdc:ssrb.0"  "gdc:ssrf.0" 
##  [289] "gdc:st5r.0"  "gdc:stqr.0"  "gdc:stqx.0"  "gdc:str1.0" 
##  [293] "gdc:str4.0"  "gdc:str5.0"  "gdc:str8.0"  "gdc:strd.0" 
##  [297] "gdc:strh.0"  "gdc:strp.0"  "gdc:strq.0"  "gdc:strx.0" 
##  [301] "gdc:stsd.0"  "gdc:stst.0"  "gdc:stt4.0"  "gdc:sttf.0" 
##  [305] "gdc:sv5m.0"  "gdc:swzc.0"  "gdc:swzz.0"  "gdc:t25v.0" 
##  [309] "gdc:t2z2.0"  "gdc:t337.0"  "gdc:t8tr.0"  "gdc:t907.0" 
##  [313] "gdc:t96n.0"  "gdc:t96r.0"  "gdc:t96s.0"  "gdc:t96w.0" 
##  [317] "gdc:t971.0"  "gdc:t976.0"  "gdc:t977.0"  "gdc:t97f.0" 
##  [321] "gdc:t9p7.0"  "gdc:t9ps.0"  "gdc:tbf2.0"  "gdc:tbqn.0" 
##  [325] "gdc:tg37.0"  "gdc:tgw6.0"  "gdc:tgw9.0"  "gdc:th0n.0" 
##  [329] "gdc:tn4k.0"  "gdc:trwt.0"  "gdc:tv5b.0"  "gdc:tv5k.0" 
##  [333] "gdc:tv5t.0"  "gdc:tv63.0"  "gdc:tv65.0"  "gdc:tv6f.0" 
##  [337] "gdc:tv6h.0"  "gdc:tvb0.0"  "gdc:tvbm.0"  "gdc:tvc5.0" 
##  [341] "gdc:tvc9.0"  "gdc:tvcm.0"  "gdc:tvdq.0"  "gdc:tvdx.0" 
##  [345] "gdc:tvfs.0"  "gdc:tvg7.0"  "gdc:tvgd.0"  "gdc:tvgg.0" 
##  [349] "gdc:tvgp.0"  "gdc:tvgr.0"  "gdc:twnw.0"  "gdc:twt3.0" 
##  [353] "gdc:tx4z.0"  "gdc:txtj.0"  "gdc:tz0c.0"  "gdc:tz39.0" 
##  [357] "gdc:tz6r.0"  "gdc:tz9d.0"  "gdc:tzgk.0"  "gdc:v0fv.0" 
##  [361] "gdc:v0nx.0"  "gdc:v183.0"  "gdc:v186.0"  "gdc:v18g.0" 
##  [365] "gdc:v1sm.0"  "gdc:v341.0"  "gdc:v3mq.0"  "gdc:v3mw.0" 
##  [369] "gdc:v3mx.0"  "gdc:v3n4.0"  "gdc:v3q4.0"  "gdc:v3qp.0" 
##  [373] "gdc:v3rv.0"  "gdc:v3rz.0"  "gdc:v3sh.0"  "gdc:v3sw.0" 
##  [377] "gdc:v3t3.0"  "gdc:v3t6.0"  "gdc:v3tw.0"  "gdc:v3vb.0" 
##  [381] "gdc:v3vp.0"  "gdc:v3vz.0"  "gdc:v3w3.0"  "gdc:v401.0" 
##  [385] "gdc:v409.0"  "gdc:vhgn.0"  "gdc:vnmx.0"  "gdc:vnn3.0" 
##  [389] "gdc:vpv1.0"  "gdc:vpvw.0"  "gdc:vpvx.0"  "gdc:vrcn.0" 
##  [393] "gdc:vrdc.0"  "gdc:vrdj.0"  "gdc:vrdk.0"  "gdc:vrds.0" 
##  [397] "gdc:w3zd.0"  "gdc:w407.0"  "gdc:w40m.0"  "gdc:w40n.0" 
##  [401] "gdc:wbvk.0"  "gdc:wc0g.0"  "gdc:wc9p.0"  "gdc:wc9q.0" 
##  [405] "gdc:wcbf.0"  "gdc:wcj0.0"  "gdc:wd1q.0"  "gdc:wd6t.0" 
##  [409] "gdc:wdq6.0"  "gdc:wdvq.0"  "gdc:wf6k.0"  "gdc:wp6p.0" 
##  [413] "gdc:wpvj.0"  "gdc:wz0s.0"  "gdc:wzgs.0"  "gdc:wznj.0" 
##  [417] "gdc:wznq.0"  "gdc:wznr.0"  "gdc:wzp1.0"  "gdc:wzp6.0" 
##  [421] "gdc:wzp9.0"  "gdc:wzpd.0"  "gdc:wzph.0"  "gdc:wzpp.0" 
##  [425] "gdc:wzps.0"  "gdc:wzpz.0"  "gdc:wzq0.0"  "gdc:wzq9.0" 
##  [429] "gdc:x34j.0"  "gdc:x37n.0"  "gdc:x37q.0"  "gdc:x390.0" 
##  [433] "gdc:x39r.0"  "gdc:x3b4.0"  "gdc:x3bf.0"  "gdc:x3bs.0" 
##  [437] "gdc:x3cm.0"  "gdc:x3cv.0"  "gdc:x3d6.0"  "gdc:x3f6.0" 
##  [441] "gdc:x3zz.0"  "gdc:x400.0"  "gdc:x4gj.0"  "gdc:x4q9.0" 
##  [445] "gdc:x4vf.0"  "gdc:x4vh.0"  "gdc:x4vp.0"  "gdc:x4vq.0" 
##  [449] "gdc:x4vx.0"  "gdc:x5bv.0"  "gdc:x82c.0"  "gdc:x8cf.0" 
##  [453] "gdc:x8fs.0"  "gdc:x8hc.0"  "gdc:x8ks.0"  "gdc:x8mr.0" 
##  [457] "gdc:x8mw.0"  "gdc:x8nh.0"  "gdc:x8np.0"  "gdc:x8nq.0" 
##  [461] "gdc:x90m.0"  "gdc:x90s.0"  "gdc:x90t.0"  "gdc:x9jv.0" 
##  [465] "gdc:xknr.0"  "test:rjmw.0" "test:rksp.0" "tg:11d11.0" 
##  [469] "tg:11d1w.0"  "tg:11d2m.0"  "tg:11d42.0"  "tg:11dtg.0" 
##  [473] "tg:11f78.0"  "tg:11f7n.0"  "tg:11f81.0"  "tg:11f9k.0" 
##  [477] "tg:11fg5.0"  "tg:11fzp.0"  "tg:11g1d.0"  "tg:11g3h.0" 
##  [481] "tg:11g5f.0"  "tg:11g5g.0"  "tg:11g5q.0"  "tg:11g5w.0" 
##  [485] "tg:11g68.0"  "tg:11g9q.0"  "tg:11g9w.0"  "tg:11gn9.0" 
##  [489] "tg:11gtp.0"  "tg:11gx9.0"  "tg:11h10.0"  "tg:11h60.0" 
##  [493] "tg:11hb1.0"  "tg:11hb5.0"  "tg:11hdv.0"  "tg:12675.0" 
##  [497] "tg:jkhx.0"   "tg:jkj1.0"   "tg:jkjb.0"   "tg:jkjf.0"  
##  [501] "tg:jn5z.0"   "tg:jn65.0"   "tg:jn6f.0"   "tg:jn6k.0"  
##  [505] "tg:jn6r.0"   "tg:jn73.0"   "tg:jnhh.0"   "tg:jnhn.0"  
##  [509] "tg:jnj6.0"   "tg:k1f7.0"   "tg:k1fb.0"   "tg:k1ff.0"  
##  [513] "tg:k4xx.0"   "tg:k59h.0"   "tg:k77x.0"   "tg:k7f7.0"  
##  [517] "tg:k8cn.0"   "tg:k936.0"   "tg:k93c.0"   "tg:k93g.0"  
##  [521] "tg:k93r.0"   "tg:k93v.0"   "tg:k940.0"   "tg:k942.0"  
##  [525] "tg:kcj3.0"   "tg:kd4c.0"   "tg:kd4g.0"   "tg:kd4r.0"  
##  [529] "tg:kd4v.0"   "tg:kd4z.0"   "tg:kd58.0"   "tg:kjfc.0"  
##  [533] "tg:kjfg.0"   "tg:kjfz.0"   "tg:kjg2.0"   "tg:kjgc.0"  
##  [537] "tg:kjgj.0"   "tg:kjgv.0"   "tg:kmdh.0"   "tg:kmdk.0"  
##  [541] "tg:kmdp.0"   "tg:kmdt.0"   "tg:kmdw.0"   "tg:kmf6.0"  
##  [545] "tg:kmf9.0"   "tg:kmff.0"   "tg:kmfh.0"   "tg:kmpd.0"  
##  [549] "tg:kmq3.0"   "tg:kmqb.0"   "tg:knb2.0"   "tg:kp19.0"  
##  [553] "tg:kpkv.0"   "tg:kpsk.0"   "tg:kpsp.0"   "tg:kpsz.0"  
##  [557] "tg:kq01.0"   "tg:kq0s.0"   "tg:kq0w.0"   "tg:kq1k.0"  
##  [561] "tg:kq6g.0"   "tg:kq6k.0"   "tg:kq6v.0"   "tg:kqwr.0"  
##  [565] "tg:krdh.0"   "tg:krf8.0"   "tg:krfj.0"   "tg:ksbx.0"  
##  [569] "tg:kscn.0"   "tg:kscv.0"   "tg:ksd1.0"   "tg:kssc.0"  
##  [573] "tg:ktrh.0"   "tg:ktrm.0"   "tg:ktsm.0"   "tg:ktsx.0"  
##  [577] "tg:ktt1.0"   "tg:kv4f.0"   "tg:kw8v.0"   "tg:kw94.0"  
##  [581] "tg:kw97.0"   "tg:m086.0"   "tg:m08h.0"   "tg:m0br.0"  
##  [585] "tg:mgfk.0"   "tg:mgfp.0"   "tg:mhq8.0"   "tg:mjbp.0"  
##  [589] "tg:mk3n.0"   "tg:mr3k.0"   "tg:msdg.0"   "tg:msqv.0"  
##  [593] "tg:mv71.0"   "tg:mv7f.0"   "tg:mvqk.0"   "tg:mvqp.0"  
##  [597] "tg:mvqs.0"   "tg:mvr3.0"   "tg:mvrf.0"   "tg:mvrr.0"  
##  [601] "tg:mvrv.0"   "tg:n0xc.0"   "tg:n23j.0"   "tg:n2k7.0"  
##  [605] "tg:n2kq.0"   "tg:n2m1.0"   "tg:n3cn.0"   "tg:n3dj.0"  
##  [609] "tg:n750.0"   "tg:n79j.0"   "tg:n7rr.0"   "tg:n7s1.0"  
##  [613] "tg:n7s9.0"   "tg:n7sd.0"   "tg:n7sh.0"   "tg:ndmr.0"  
##  [617] "tg:nds0.0"   "tg:ndtw.0"   "tg:nfdb.0"   "tg:nfhm.0"  
##  [621] "tg:nkbv.0"   "tg:nkcq.0"   "tg:nkdm.0"   "tg:nkdq.0"  
##  [625] "tg:nkdt.0"   "tg:nks0.0"   "tg:nkww.0"   "tg:nm24.0"  
##  [629] "tg:nm27.0"   "tg:nm2b.0"   "tg:nmf9.0"   "tg:nmfg.0"  
##  [633] "tg:nmfn.0"   "tg:nmfs.0"   "tg:nmfv.0"   "tg:nmg0.0"  
##  [637] "tg:nmg5.0"   "tg:nmgc.0"   "tg:nns8.0"   "tg:np3s.0"  
##  [641] "tg:np6j.0"   "tg:npm2.0"   "tg:npsg.0"   "tg:ns3r.0"  
##  [645] "tg:ns9c.0"   "tg:nsv5.0"   "tg:ntqj.0"   "tg:nttc.0"  
##  [649] "tg:nv2j.0"   "tg:nvk1.0"   "tg:p5bz.0"   "tg:p5c7.0"  
##  [653] "tg:p5c8.0"   "tg:p5cg.0"   "tg:p7dk.0"   "tg:p7hs.0"  
##  [657] "tg:p7m9.0"   "tg:p878.0"   "tg:p87j.0"   "tg:p8d8.0"  
##  [661] "tg:p8rz.0"   "tg:p8vz.0"   "tg:pb0f.0"   "tg:pb4c.0"  
##  [665] "tg:pb6v.0"   "tg:pgp7.0"   "tg:pgph.0"   "tg:pgpm.0"  
##  [669] "tg:pj21.0"   "tg:pj26.0"   "tg:pj27.0"   "tg:pj2h.0"  
##  [673] "tg:pjw6.0"   "tg:pk02.0"   "tg:pk6x.0"   "tg:pkgw.0"  
##  [677] "tg:pkh4.0"   "tg:pkhc.0"   "tg:pkhg.0"   "tg:pkhh.0"  
##  [681] "tg:pkhm.0"   "tg:pkhv.0"   "tg:pkj1.0"   "tg:pkj2.0"  
##  [685] "tg:pkjc.0"   "tg:pkjk.0"   "tg:pkng.0"   "tg:pksn.0"  
##  [689] "tg:pkx0.0"   "tg:pm1c.0"   "tg:pm7m.0"   "tg:pm99.0"  
##  [693] "tg:pmfw.0"   "tg:pmgg.0"   "tg:pmj9.0"   "tg:pmp3.0"  
##  [697] "tg:pmtc.0"   "tg:pn22.0"   "tg:pn2n.0"   "tg:psxj.0"  
##  [701] "tg:ptkw.0"   "tg:pvpw.0"   "tg:pvpz.0"   "tg:pvq4.0"  
##  [705] "tg:pvq5.0"   "tg:pvqc.0"   "tg:pvxf.0"   "tg:pvxk.0"  
##  [709] "tg:pzj2.0"   "tg:q3hr.0"   "tg:q41n.0"   "tg:q4j5.0"  
##  [713] "tg:q5q1.0"   "tg:qd27.0"   "tg:qd2g.0"   "tg:qd2n.0"  
##  [717] "tg:qd3h.0"   "tg:qd46.0"   "tg:qd6w.0"   "tg:qdcq.0"  
##  [721] "tg:qdhd.0"   "tg:qdj2.0"   "tg:qdjz.0"   "tg:qdk1.0"  
##  [725] "tg:qdks.0"   "tg:qdm5.0"   "tg:qdm9.0"   "tg:qdmd.0"  
##  [729] "tg:qdmw.0"   "tg:qdnz.0"   "tg:qdqn.0"   "tg:qfsn.0"  
##  [733] "tg:qfx0.0"   "tg:qfxf.0"   "tg:qfxj.0"   "tg:qggj.0"  
##  [737] "tg:qh0j.0"   "tg:qh44.0"   "tg:qhpn.0"   "tg:qhwd.0"  
##  [741] "tg:qkkz.0"   "tg:qkm4.0"   "tg:qkm5.0"   "tg:qkmh.0"  
##  [745] "tg:qkmq.0"   "tg:qkmw.0"   "tg:qkmx.0"   "tg:qkn4.0"  
##  [749] "tg:qknj.0"   "tg:qkp2.0"   "tg:qkp7.0"   "tg:qkps.0"  
##  [753] "tg:qkq1.0"   "tg:qm9d.0"   "tg:qmsf.0"   "tg:qmt7.0"  
##  [757] "tg:qn26.0"   "tg:qn29.0"   "tg:qn2w.0"   "tg:qn30.0"  
##  [761] "tg:qx5q.0"   "tg:qxh9.0"   "tg:qz9g.0"   "tg:qzz7.0"  
##  [765] "tg:r0gb.0"   "tg:r0hz.0"   "tg:r0jz.0"   "tg:r0n2.0"  
##  [769] "tg:r0px.0"   "tg:r0sb.0"   "tg:r0vx.0"   "tg:r10b.0"  
##  [773] "tg:r11m.0"   "tg:r121.0"   "tg:r12k.0"   "tg:r12n.0"  
##  [777] "tg:r12v.0"   "tg:r134.0"   "tg:r135.0"   "tg:r13g.0"  
##  [781] "tg:r1hd.0"   "tg:r1zm.0"   "tg:r22j.0"   "tg:r231.0"  
##  [785] "tg:r23w.0"   "tg:r2m4.0"   "tg:r2m9.0"   "tg:r2mf.0"  
##  [789] "tg:r2mm.0"   "tg:r2mn.0"   "tg:r2mv.0"   "tg:r2qz.0"  
##  [793] "tg:r2r7.0"   "tg:r2rb.0"   "tg:rdg3.0"   "tg:rdgt.0"  
##  [797] "tg:rdhd.0"   "tg:rfbt.0"   "tg:rfbx.0"   "tg:rfc7.0"  
##  [801] "tg:rfcn.0"   "tg:rfct.0"   "tg:rfcv.0"   "tg:rfd2.0"  
##  [805] "tg:rfd8.0"   "tg:rfm5.0"   "tg:rfmj.0"   "tg:rfn5.0"  
##  [809] "tg:rfx8.0"   "tg:rfxf.0"   "tg:rfxg.0"   "tg:rhjs.0"  
##  [813] "tg:rhnx.0"   "tg:rhqt.0"   "tg:rhtz.0"   "tg:rhzq.0"  
##  [817] "tg:rj22.0"   "tg:rjdc.0"   "tg:rjg0.0"   "tg:rjh9.0"  
##  [821] "tg:rjmw.0"   "tg:rjqs.0"   "tg:rjwv.0"   "tg:rjzc.0"  
##  [825] "tg:rkpt.0"   "tg:rkq6.0"   "tg:rksp.0"   "tg:rktc.0"  
##  [829] "tg:rkzk.0"   "tg:rkzp.0"   "tg:rm3n.0"   "tg:s4hh.0"  
##  [833] "tg:s4hw.0"   "tg:s4jp.0"   "tg:s4m0.0"   "tg:s4m2.0"  
##  [837] "tg:s4nb.0"   "tg:s6kj.0"   "tg:s6kn.0"   "tg:s6m2.0"  
##  [841] "tg:s6m6.0"   "tg:s8bj.0"   "tg:s8br.0"   "tg:s8bv.0"  
##  [845] "tg:s8c4.0"   "tg:s8c9.0"   "tg:s8cb.0"   "tg:s8fd.0"  
##  [849] "tg:s8fh.0"   "tg:s9j7.0"   "tg:sc8m.0"   "tg:sc8q.0"  
##  [853] "tg:scbg.0"   "tg:scbm.0"   "tg:sf41.0"   "tg:sf49.0"  
##  [857] "tg:sf4d.0"   "tg:sj68.0"   "tg:sj6f.0"   "tg:sj6g.0"  
##  [861] "tg:sj6r.0"   "tg:sj6v.0"   "tg:sj74.0"   "tg:sjhr.0"  
##  [865] "tg:sjpt.0"   "tg:spd6.0"   "tg:spgf.0"   "tg:ssrb.0"  
##  [869] "tg:ssrf.0"   "tg:sssj.0"   "tg:sssn.0"   "tg:st5r.0"  
##  [873] "tg:stqr.0"   "tg:stqx.0"   "tg:str1.0"   "tg:str4.0"  
##  [877] "tg:str5.0"   "tg:str8.0"   "tg:strd.0"   "tg:strh.0"  
##  [881] "tg:strp.0"   "tg:strq.0"   "tg:strx.0"   "tg:stsd.0"  
##  [885] "tg:stst.0"   "tg:stt4.0"   "tg:sttf.0"   "tg:sv5m.0"  
##  [889] "tg:svt2.0"   "tg:swzc.0"   "tg:swzz.0"   "tg:t25v.0"  
##  [893] "tg:t2z2.0"   "tg:t337.0"   "tg:t3sh.0"   "tg:t4rs.0"  
##  [897] "tg:t4rw.0"   "tg:t8tr.0"   "tg:t907.0"   "tg:t969.0"  
##  [901] "tg:t96n.0"   "tg:t96r.0"   "tg:t96s.0"   "tg:t96w.0"  
##  [905] "tg:t971.0"   "tg:t976.0"   "tg:t977.0"   "tg:t97f.0"  
##  [909] "tg:t9p6.0"   "tg:t9p7.0"   "tg:t9ph.0"   "tg:t9ps.0"  
##  [913] "tg:t9qb.0"   "tg:t9qh.0"   "tg:t9qj.0"   "tg:tbf2.0"  
##  [917] "tg:tbqn.0"   "tg:tc1r.0"   "tg:tff9.0"   "tg:tfsf.0"  
##  [921] "tg:tftj.0"   "tg:tg23.0"   "tg:tg37.0"   "tg:tg9f.0"  
##  [925] "tg:tgw6.0"   "tg:tgw9.0"   "tg:tgwk.0"   "tg:th0n.0"  
##  [929] "tg:tn4k.0"   "tg:tndf.0"   "tg:tp8b.0"   "tg:tp8d.0"  
##  [933] "tg:tpd7.0"   "tg:tphz.0"   "tg:tpp9.0"   "tg:tpq4.0"  
##  [937] "tg:tpq5.0"   "tg:tpqs.0"   "tg:tpx6.0"   "tg:tpxs.0"  
##  [941] "tg:tq2b.0"   "tg:trwt.0"   "tg:tv5b.0"   "tg:tv5k.0"  
##  [945] "tg:tv5t.0"   "tg:tv63.0"   "tg:tv65.0"   "tg:tv6f.0"  
##  [949] "tg:tv6h.0"   "tg:tv6t.0"   "tg:tvb0.0"   "tg:tvbg.0"  
##  [953] "tg:tvbm.0"   "tg:tvc5.0"   "tg:tvc9.0"   "tg:tvcm.0"  
##  [957] "tg:tvdq.0"   "tg:tvdx.0"   "tg:tvfs.0"   "tg:tvg7.0"  
##  [961] "tg:tvgd.0"   "tg:tvgg.0"   "tg:tvgp.0"   "tg:tvgr.0"  
##  [965] "tg:twnw.0"   "tg:twt3.0"   "tg:tx4z.0"   "tg:txtj.0"  
##  [969] "tg:tz0c.0"   "tg:tz39.0"   "tg:tz6r.0"   "tg:tz9d.0"  
##  [973] "tg:tzgk.0"   "tg:v0d8.0"   "tg:v0fv.0"   "tg:v0hv.0"  
##  [977] "tg:v0nx.0"   "tg:v183.0"   "tg:v186.0"   "tg:v18g.0"  
##  [981] "tg:v18m.0"   "tg:v1sm.0"   "tg:v341.0"   "tg:v3mq.0"  
##  [985] "tg:v3mw.0"   "tg:v3mx.0"   "tg:v3n4.0"   "tg:v3nb.0"  
##  [989] "tg:v3nq.0"   "tg:v3q4.0"   "tg:v3qp.0"   "tg:v3rv.0"  
##  [993] "tg:v3rz.0"   "tg:v3s3.0"   "tg:v3sh.0"   "tg:v3sw.0"  
##  [997] "tg:v3t3.0"   "tg:v3t6.0"   "tg:v3tw.0"   "tg:v3vb.0"  
## [1001] "tg:v3vp.0"   "tg:v3vz.0"   "tg:v3w3.0"   "tg:v401.0"  
## [1005] "tg:v409.0"   "tg:vhgn.0"   "tg:vkgr.0"   "tg:vmhp.0"  
## [1009] "tg:vmht.0"   "tg:vmm5.0"   "tg:vmvc.0"   "tg:vmw0.0"  
## [1013] "tg:vmw8.0"   "tg:vmwh.0"   "tg:vmxv.0"   "tg:vn0c.0"  
## [1017] "tg:vn0h.0"   "tg:vn0r.0"   "tg:vn11.0"   "tg:vn1b.0"  
## [1021] "tg:vn1q.0"   "tg:vn1w.0"   "tg:vn2c.0"   "tg:vn2f.0"  
## [1025] "tg:vn2s.0"   "tg:vn3b.0"   "tg:vn3t.0"   "tg:vn3v.0"  
## [1029] "tg:vn41.0"   "tg:vn4h.0"   "tg:vn4q.0"   "tg:vn59.0"  
## [1033] "tg:vn5n.0"   "tg:vn60.0"   "tg:vn79.0"   "tg:vn7q.0"  
## [1037] "tg:vn90.0"   "tg:vnbs.0"   "tg:vncw.0"   "tg:vnd5.0"  
## [1041] "tg:vndf.0"   "tg:vnf1.0"   "tg:vnfd.0"   "tg:vnft.0"  
## [1045] "tg:vnfz.0"   "tg:vnh9.0"   "tg:vnj5.0"   "tg:vnjn.0"  
## [1049] "tg:vnkg.0"   "tg:vnmx.0"   "tg:vnn3.0"   "tg:vpv1.0"  
## [1053] "tg:vpv9.0"   "tg:vpvd.0"   "tg:vpvq.0"   "tg:vpvw.0"  
## [1057] "tg:vpvx.0"   "tg:vrcn.0"   "tg:vrdc.0"   "tg:vrdj.0"  
## [1061] "tg:vrdk.0"   "tg:vrds.0"   "tg:w3sp.0"   "tg:w3vd.0"  
## [1065] "tg:w3vm.0"   "tg:w3w0.0"   "tg:w3x0.0"   "tg:w3xb.0"  
## [1069] "tg:w3xk.0"   "tg:w3xp.0"   "tg:w3z2.0"   "tg:w3zd.0"  
## [1073] "tg:w407.0"   "tg:w40m.0"   "tg:w40n.0"   "tg:wbvk.0"  
## [1077] "tg:wc0g.0"   "tg:wc9p.0"   "tg:wc9q.0"   "tg:wcbf.0"  
## [1081] "tg:wcj0.0"   "tg:wd1q.0"   "tg:wd6t.0"   "tg:wdfd.0"  
## [1085] "tg:wdq6.0"   "tg:wdvq.0"   "tg:wf5v.0"   "tg:wf6k.0"  
## [1089] "tg:wfsd.0"   "tg:wp6p.0"   "tg:wpvj.0"   "tg:wz0s.0"  
## [1093] "tg:wz12.0"   "tg:wzgs.0"   "tg:wznj.0"   "tg:wznq.0"  
## [1097] "tg:wznr.0"   "tg:wzp1.0"   "tg:wzp6.0"   "tg:wzp9.0"  
## [1101] "tg:wzpd.0"   "tg:wzph.0"   "tg:wzpp.0"   "tg:wzps.0"  
## [1105] "tg:wzpz.0"   "tg:wzq0.0"   "tg:wzq9.0"   "tg:x22s.0"  
## [1109] "tg:x34j.0"   "tg:x37n.0"   "tg:x37q.0"   "tg:x390.0"  
## [1113] "tg:x39r.0"   "tg:x3b4.0"   "tg:x3bf.0"   "tg:x3bs.0"  
## [1117] "tg:x3cm.0"   "tg:x3cv.0"   "tg:x3d6.0"   "tg:x3f6.0"  
## [1121] "tg:x3zz.0"   "tg:x400.0"   "tg:x438.0"   "tg:x480.0"  
## [1125] "tg:x48t.0"   "tg:x495.0"   "tg:x4c3.0"   "tg:x4gj.0"  
## [1129] "tg:x4q9.0"   "tg:x4vf.0"   "tg:x4vh.0"   "tg:x4vp.0"  
## [1133] "tg:x4vq.0"   "tg:x4vx.0"   "tg:x5bv.0"   "tg:x82c.0"  
## [1137] "tg:x8bm.0"   "tg:x8br.0"   "tg:x8cd.0"   "tg:x8cf.0"  
## [1141] "tg:x8fs.0"   "tg:x8hc.0"   "tg:x8ks.0"   "tg:x8mr.0"  
## [1145] "tg:x8mw.0"   "tg:x8nh.0"   "tg:x8np.0"   "tg:x8nq.0"  
## [1149] "tg:x90m.0"   "tg:x90s.0"   "tg:x90t.0"   "tg:x9jv.0"  
## [1153] "tg:x9rp.0"   "tg:xknr.0"   "turm:1"      "turm:2"     
## [1157] "turm:3"

If you see something similar to this, you’re good to go!

Step 1: Select a Play

Load meta data for all ids

Select one play of your choice, and create a new variable that contains the id, for example:

For the following analysis steps, everyone gets different results!

Loading the text

The variable text now refers to a table with 18 columns and 45243 rows.

  • Columns have the following names
    • corpus, drama, begin.Act, end.Act, Number.Act, begin.Scene, end.Scene, Number.Scene, begin, end, Speaker.figure_surface, Speaker.figure_id, Token.surface, Token.pos, Token.lemma, length, Mentioned.figure_surface, Mentioned.figure_id
  • Rows don’t have names – they represent the spoken words in the play
    • Each word is on one row (i.e., the characters in the play utter 45243 words)
  • You don’t have to work with the table directly

Statistics about characters

  • Function: figureStatistics(...) to get simple character statistics.
    • Entering ?figureStatistics shows documentation
  • Rows: The characters of the play
  • Columns: Extracted properties
    • corpus, drama: meta data to identify the play
    • length: The length of the play in tokens
    • figure: An identifier for the character
    • tokens: Number of tokens spoken by this character
    • types: Number of different tokens spoken by this character
    • utterances: Number of utterances by this character
    • utteranceLengthMean, utteranceLengthSd: Mean and standard deviation of the utterance length
    • firstBegin, lastEnd: The character offset of the first and last utterance

Visualising character statistics

  • The $-operator selects one column
    • fstat$tokens gives you a list of the token numbers of the characters (in the order of the table)

Vignettes

Advice

  • Don’t be afraid: It’s much harder to really break your computer than you think
  • ?functionName gives you the documentation for a function
  • If you are stuck: Please ask
  • Be aware that all this is research software
    • It contains bugs and implementation errors
    • It is not directed at end users