This post gives an overview of the current state of the technology we are using and how we plan to proceed in the future. All programs, scripts and components are available on our github page.
The NLP processing components (e.g., PoS-Tagging and Lemmatization, but also speaker identification etc.) are grouped in the module DramaNLP.
To have a clear repository of the processed texts, we created a tomcat web app as a web service. The web service can be used to update certain kinds of meta data or re-run NLP analyses to a certain extent. Most importantly, the web services offers querying options to retrieve CSV-tables that can be processed with R. In fact, R directly access the web service using
The R package (currently called DramaAnalysis, which is a bit unfortunate…) collects a number of often used functions. It will grow over time and be extended.
This web page / web app