Description Usage Arguments Value See Also Examples
View source: R/read_dir_transcript.R
Read in multiple transcript files from a directory and create a
base::data.frame()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | read_dir_transcript(
path,
col.names = c("Document", "Person", "Dialogue"),
pattern = NULL,
all.files = FALSE,
recursive = FALSE,
skip = 0,
merge.broke.tot = TRUE,
header = FALSE,
dash = "",
ellipsis = "...",
quote2bracket = FALSE,
rm.empty.rows = TRUE,
na = "",
sep = NULL,
comment.char = "",
max.person.nchar = 20,
ignore.case = FALSE,
verbose = FALSE,
...
)
|
path |
Path to the directory. |
col.names |
A character vector specifying the column names of the transcript columns (document, person, dialogue). |
pattern |
An optional regular expression. Only file names which match the regular expression will be returned. |
all.files |
Logical. If |
recursive |
Logical. Should the listing recurse into directories? |
skip |
Integer; the number of lines of the data file to skip before beginning to read data. |
merge.broke.tot |
logical. If |
header |
logical. If |
dash |
A character string to replace the en and em dashes special characters (default is to remove). |
ellipsis |
A character string to replace the ellipsis special characters. |
quote2bracket |
logical. If |
rm.empty.rows |
logical. If |
na |
A character string to be interpreted as an |
sep |
The field separator character. Values on each line of the file are
separated by this character. The default of |
comment.char |
A character vector of length one containing a single
character or an empty string. Use |
max.person.nchar |
The max number of characters long names are expected to be. This information is used to warn the user if a separator appears beyond this length in the text. |
ignore.case |
logical. If |
verbose |
Logical. Should Each iteration of the read-in be reported. |
... |
ignored. |
Returns a dataframe of documents, dialogue, and people.
read_transcript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | skips <- c(0, 1, 1, 0, 0, 1)
path <- system.file("docs/transcripts", package = 'textreadr')
textreadr::peek(read_dir_transcript(path, skip = skips), Inf)
## Not run:
## with additional cleaning
library(tidyverse, textshape, textclean)
path %>%
read_dir_transcript(skip = skips) %>%
textclean::filter_row("Person", "^\\[") %>%
mutate(
Person = stringi::stri_replace_all_regex(Person, "(^/\\s*)|(:\\s*$)", "") %>%
trimws(),
Dialogue = stringi::stri_replace_all_regex(Dialogue, "(^/\\s*)", "")
) %>%
peek(Inf)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.