View source: R/parse_screenplay.R
screenplay_to_events | R Documentation |
This function reads in a well-formatted screenplay (via
pdftools::pdf_text()
), parses it and tags each line based on a series
of regular expressions. The function attempts to tag each line as either a
scene boundary, scene description, character name, dialogue, dialogue
description, stage direction, or page information. These tags are then used
to generate an event list of character interactions based on these tags. In
particular, the scene boundary tags and character name tags are used to
identify which characters speak within scenes.
In order to work, the screenplay must use consistent patterns of indentation for different components and follow certain industry standard formatting conventions for screenplays. It is unlikely to work if the screenplay is watermarked. See Agarwal et al. (2014) for more discussion of the limitations of regex-based tagging of screenplays.
screenplay_to_events(pdf_file, window = 5)
pdf_file |
File path to the screenplay PDF file. |
window |
A numeric value specifying the context window within which to
look for 'recipients' of a line of dialogue. For a given line of dialogue
|
A matrix containing a time-ordered multicast event list. The first
column contains an event index, the second contains a scene index, the
third contains the speaker ID, and the remaining columns contain dummy
variables for each character indicating whether they were the 'recipient'
of the line of dialogue (which is determined by whether they spoke within
the same scene and within n=window
lines of dialogue of the current
line).
Agarwal, Apoorv, Sriramkumar Balasubramanian, Jiehan Zheng, and Sarthak Dash. ”Parsing Screenplays for Extracting Social Networks from Movies.” In Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLfL), 50-58. Gothenburg, Sweden: Association for Computational Linguistics, 2014.
## Not run: my_pdf <- "path/to/pdf/of/screenplay.pdf"
my_event_list <- screenplay_to_events(my_pdf)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.