Description Usage Arguments Value Examples
A wrapper for split_match_regex
and
textreadr's as_transript
to detect person variable, split the
text into turns of talk, and convert to a data.frame with person
and
dialogue
variables. There is a bit of cleansing that is closer to
as_transript
than split_transcript
.
1 2 3 4 5 6 7 8 9 10 11 | split_match_regex_to_transcript(
x,
person.regex = "^[A-Z]{3,}",
col.names = c("Person", "Dialogue"),
dash = "",
ellipsis = "...",
quote2bracket = FALSE,
rm.empty.rows = TRUE,
skip = 0,
...
)
|
x |
A vector with split points. |
person.regex |
A vector of places (elements) to split on or a regular
expression if |
col.names |
A character vector specifying the column names of the transcript columns. |
dash |
A character string to replace the en and em dashes special characters (default is to remove). |
ellipsis |
A character string to replace the ellipsis special characters. |
quote2bracket |
logical. If |
rm.empty.rows |
logical. If |
skip |
Integer; the number of lines of the data file to skip before beginning to read data. |
... |
ignored. |
Returns a data.frame of dialogue and people.
1 2 3 4 5 6 7 8 9 | ## Not run:
system.file(
"docs/Simpsons_Roasting_on_an_Open_Fire_Script.pdf",
package = "textshape"
) %>%
textreadr::read_document() %>%
split_match_regex_to_transcript("^[A-Z]{3,}", skip = 2)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.