These methods retrieve
count the number of occurrences of the words in the dictionaries,
across different speakers and/or segments.
The function dictionaryStatistics()
calculates statistics for
dictionaries with multiple entries, dictionaryStatisticsSingle()
only
for a single word list.
Extract the number part from a
QDDictionaryStatistics
table as a matrix
dictionaryStatistics(drama, fields = DramaAnalysis::base_dictionary[fieldnames], fieldnames = c("Liebe"), segment = c("Drama", "Act", "Scene"), normalizeByCharacter = FALSE, normalizeByField = FALSE, byCharacter = TRUE, column = "Token.lemma", ci = TRUE) dictionaryStatisticsSingle(drama, wordfield = c(), segment = c("Drama", "Act", "Scene"), normalizeByCharacter = FALSE, normalizeByField = FALSE, byCharacter = TRUE, fieldNormalizer = length(wordfield), column = "Token.lemma", ci = TRUE, colnames = NULL) # S3 method for QDDictionaryStatistics as.matrix(x, ...)
drama | A QDDrama object. |
---|---|
fields | A list of lists that contains the actual field names.
By default, we load the |
fieldnames | A list of names for the dictionaries. |
segment | The segment level that should be used. By default, the entire play will be used. Possible values are "Drama" (default), "Act" or "Scene". |
normalizeByCharacter | Logical. Whether to normalize by character speech length. |
normalizeByField | Logical. Whether to normalize by dictionary size. You usually want this. |
byCharacter | Logical, defaults to TRUE. If false, values will be calculated for the entire segment (play, act, or scene), and not for individual characters. |
column | The table column we apply the dictionary on. Should be either "Token.surface" or "Token.lemma", the latter is the default. |
ci | Whether to ignore case. Defaults to TRUE, i.e., case is ignored. |
wordfield | A character vector containing the words or lemmas
to be counted (only for |
fieldNormalizer | Defaults to the length of the wordfield. If normalizeByField is given, the absolute numbers are divided by this number. |
colnames | The column names to be used in the output table. |
x | An object of the type |
... | All other parameters are passed to |
A numeric matrix that contains the frequency with which a dictionary is present in a subset of tokens
# Check multiple dictionary entries data(rksp.0) dstat <- dictionaryStatistics(rksp.0, fieldnames=c("Krieg","Familie")) # Check a single dictionary entries data(rksp.0) fstat <- dictionaryStatisticsSingle(rksp.0, wordfield=c("der")) mat <- as.matrix(dictionaryStatistics(rksp.0, fieldnames=c("Krieg","Familie")))