knitr::opts_knit$set(root.dir = 'tests/testthat/')
knitr::opts_chunk$set(fig.path = "man/figures/README-")
install.packages("../..", repos = NULL, type = "source")
Read and write video subtitle formats.
devtools::install_github("benjcunningham/subtitler")
library(tidyverse) library(tidytext) library(subtitler)
read_file("Always_Sunny_S10E04.srt") %>% cat()
Consider the above SubRip file and suppose we have observed that every subtitle lags by half a second. Using the add_milliseconds()
function, we can easily adjust the timestamp of every block accordingly. We can even write back to file in the original format using write_srt()
.
f <- tempfile() read_srt("Always_Sunny_S10E04.srt") %>% mutate( start = add_milliseconds(start, -500), end = add_milliseconds(end, -500) ) %>% write_srt(f)
The package may also be useful for getting subtitles into a tidytext workflow. For example, I previously reproduced part of this article by Oliver Roeder of FiveThirtyEight, cataloging all of the times someone swore in one of Quentin Tarantino's movies. The script below mirrors the analysis on subtitles from The Wolf of Wall Street.
df <- read_srt("The_Wolf_of_Wall_Street.srt") df %>% unnest_tokens(word, text) %>% filter(str_detect(word, "[fs](uc|hi)[kt]")) %>% mutate(min = floor(as_milliseconds(start) / 6e4)) %>% ggplot(aes(min)) + geom_bar() + labs(x = "Minute", y = "# of Curses", title = "The Wolf of Wall Street (2013)") + scale_x_continuous(breaks = seq(0, 180, 60), limits = c(0, 180))
MIT © Ben Cunningham
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.