knitr::opts_chunk$set(
  collapse = TRUE,
  warning = FALSE,
  message = FALSE,
  error = FALSE,
  comment = "#>",
  fig.path = "man/figures/README-",
  fig.height = 4,
  fig.width = 9,
  out.width = "100%",
  dpi = 300
)
if (!interactive()) {
  options(width = 95)
}

srt

Lifecycle: experimental CRAN status Downloads [Codecov test coverage]cov_link [R build status]ga_link

The goal of srt is to read SubRip text files as tabular data for easy analysis and manipulation.

Installation

You can install the development version of srt from GitHub with:

# install.packages("remotes")
remotes::install_github("k5cents/srt")

Example

The .srt standard is used to identify the subtitle components for the columns of a data frame:

  1. A numeric counter identifying each sequential subtitle
  2. The time that the subtitle should appear followed by --> and the time it should disappear
  3. Subtitle text itself on one or more lines
  4. A blank line containing no text, indicating the end of this subtitle
library(srt)
library(tidyverse)
library(tidytext)
srt <- srt_example()
cat(readLines(srt, n = 11), sep = "\n")

These subtitle files are parsed as data frames with separate columns.

(wonderful_life <- read_srt(path = srt, collapse = " "))

This makes it easy to perform various text analysis on the subtitles.

wonderful_life %>% 
  unnest_tokens(word, subtitle) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words)

Or uniformly manipulate the numeric time stamps:

wonderful_life <- srt_shift(wonderful_life, seconds = 9.99)

The subtitle data frames can be easily re-written as valid SubRip files.

tmp <- tempfile(fileext = ".srt")
write_srt(wonderful_life, tmp, wrap = FALSE)
cat(readLines(tmp, n = 11), sep = "\n")


kiernann/srt documentation built on March 15, 2024, 3:28 a.m.