View source: R/split_transcript.R
split_transcript | R Documentation |
Split a transcript style vector (e.g., c("greg: Who me", "sarah: yes you!")
into a name and dialogue vector that is coerced to a data.table
.
Leading/trailing white space in the columns is stripped out.
split_transcript(
x,
delim = ":",
colnames = c("person", "dialogue"),
max.delim = 15,
...
)
x |
A transcript style vector (e.g., |
delim |
The delimiter to split on. |
colnames |
The column names to use for the |
max.delim |
An integer stating how many characters may come before a delimiter is found. This is useful for the case when a colon is the delimiter but time stamps are also found in the text. |
... |
Ignored. |
Returns a 2 column data.table
.
split_transcript(c("greg: Who me", "sarah: yes you!"))
## Not run:
## 2015 Vice-Presidential Debates Example
if (!require("pacman")) install.packages("pacman")
pacman::p_load(rvest, magrittr, xml2)
debates <- c(
wisconsin = "110908",
boulder = "110906",
california = "110756",
ohio = "110489"
)
lapply(debates, function(x){
xml2::read_html(paste0("http://www.presidency.ucsb.edu/ws/index.php?pid=", x)) %>%
rvest::html_nodes("p") %>%
rvest::html_text() %>%
textshape::split_index(grep("^[A-Z]+:", .)) %>%
textshape::combine() %>%
textshape::split_transcript() %>%
textshape::split_sentence()
})
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.