README.md

subtitler

Build Status Coverage Status

Read and write video subtitle formats.

Install

devtools::install_github("benjcunningham/subtitler")

Quick Demos

library(tidyverse)
library(tidytext)
library(subtitler)

Adjusting Timestamps

1
00:00:00,978 --> 00:00:02,539
Frank, pick up!

2
00:00:02,646 --> 00:00:04,414
Pick up, buddy, pick up, pick
up, pick up, pick up, pick up!

3
00:00:04,548 --> 00:00:06,499
Hello!
I got a Code Red, here, pal.

Consider the above SubRip file and suppose we have observed that every subtitle lags by half a second. Using the add_milliseconds() function, we can easily adjust the timestamp of every block accordingly. We can even write back to file in the original format using write_srt().

f <- tempfile()

read_srt("Always_Sunny_S10E04.srt") %>%
  mutate(
    start = add_milliseconds(start, -500),
    end   = add_milliseconds(end,   -500)
  ) %>%
  write_srt(f)

Text Mining

The package may also be useful for getting subtitles into a tidytext workflow. For example, I previously reproduced part of this article by Oliver Roeder of FiveThirtyEight, cataloging all of the times someone swore in one of Quentin Tarantino's movies. The script below mirrors the analysis on subtitles from The Wolf of Wall Street.

df <- read_srt("The_Wolf_of_Wall_Street.srt")

df %>%
  unnest_tokens(word, text) %>%
  filter(str_detect(word, "[fs](uc|hi)[kt]")) %>%
  mutate(min = floor(as_milliseconds(start) / 6e4)) %>%
  ggplot(aes(min)) +
    geom_bar() +
    labs(x = "Minute", y = "# of Curses", title = "The Wolf of Wall Street (2013)") +
    scale_x_continuous(breaks = seq(0, 180, 60), limits = c(0, 180))

License

MIT © Ben Cunningham



benjcunningham/subtitler documentation built on May 12, 2019, 11:56 a.m.