split_match_regex_to_transcript: Split Text by Regex Into a Transcript

Description Usage Arguments Value Examples

Description

A wrapper for split_match_regex and textreadr's as_transript to detect person variable, split the text into turns of talk, and convert to a data.frame with person and dialogue variables. There is a bit of cleansing that is closer to as_transript than split_transcript.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
split_match_regex_to_transcript(
  x,
  person.regex = "^[A-Z]{3,}",
  col.names = c("Person", "Dialogue"),
  dash = "",
  ellipsis = "...",
  quote2bracket = FALSE,
  rm.empty.rows = TRUE,
  skip = 0,
  ...
)

Arguments

x

A vector with split points.

person.regex

A vector of places (elements) to split on or a regular expression if regex argument is TRUE.

col.names

A character vector specifying the column names of the transcript columns.

dash

A character string to replace the en and em dashes special characters (default is to remove).

ellipsis

A character string to replace the ellipsis special characters.

quote2bracket

logical. If TRUE replaces curly quotes with curly braces (default is FALSE). If FALSE curly quotes are removed.

rm.empty.rows

logical. If TRUE read_transcript attempts to remove empty rows.

skip

Integer; the number of lines of the data file to skip before beginning to read data.

...

ignored.

Value

Returns a data.frame of dialogue and people.

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
system.file(
    "docs/Simpsons_Roasting_on_an_Open_Fire_Script.pdf", 
    package = "textshape"
) %>%
    textreadr::read_document() %>%
    split_match_regex_to_transcript("^[A-Z]{3,}", skip = 2)

## End(Not run)

textshape documentation built on May 29, 2021, 1:07 a.m.