R/read.R

Defines functions read_srt

Documented in read_srt

#' Read a subtitle file as data frame
#'
#' Convert the SubRip file format to a tabular data frame of times and text.
#'
#' The SubRip format is a newline-separated, non-tabular text file with groups
#' of subtitle text separated by a newline character and preceded by an index
#' and a timestamp string containing the length of the spoken subtitle text.
#' These components (index, time, text) can be parsed individually and combined
#' into a data frame of subtitle groups.
#'
#' @param path A path or connection to an `.srt` file.
#' @param collapse The character with which to separate subtitle lines.
#' @examples
#' # read linear text to tabular data
#' read_srt(srt_example(), collapse = " ")
#' @return A data frame of subtitles.
#' @export
read_srt <- function(path, collapse = "\n") {
  x <- enc2utf8(readLines(con = path))
  nl <- newline(x, rm.last = FALSE)
  if (any(diff(nl) == 1)) {
    x <- x[-nl[diff(nl) == 1]]
  }
  t <- srt_seconds(x)
  y <- data.frame(
    stringsAsFactors = FALSE,
    n = srt_index(x),
    start = t$start,
    end = t$end,
    subtitle = srt_text(x, collapse = collapse)
  )
  as_tibble(y)
}

Try the srt package in your browser

Any scripts or data that you put into this service are public.

srt documentation built on May 29, 2024, 9:35 a.m.