R/extract_md_tables.R

Defines functions extract_md_tables

Documented in extract_md_tables

#' @title Extract Markdown Tables from Markdown Files
#'
#' @details `extract_md_tables` captures all the markdown tables
#'   from `file` and returns a tibble or list of tibbles.
#'
#' @inheritParams read_md_table
#' @inheritDotParams readr::read_delim -trim_ws -delim
#'
#' @returns A tibble or list of tibbles extracted from the
#'   markdown tables in `file`.
#'
#' @examples
#' md <-
#' "# Heading 1
#'
#' This example splits the `mtcars` dataset into several different tables
#' with the same header.
#'
#' ## Table 1
#' The first table contains the initial four rows of the `mtcars` dataset.
#'
#' |model              |mpg |cyl|disp |hp |drat|wt   |qsec |vs |am |gear|carb|
#' |-------------------|----|---|-----|---|----|-----|-----|---|---|----|----|
#' |Mazda RX4          |21  |6  |160  |110|3.9 |2.62 |16.46|0  |1  |4   |4   |
#' |Mazda RX4 Wag      |21  |6  |160  |110|3.9 |2.875|17.02|0  |1  |4   |4   |
#' |Datsun 710         |22.8|4  |108  |93 |3.85|2.32 |18.61|1  |1  |4   |1   |
#' |Hornet 4 Drive     |21.4|6  |258  |110|3.08|3.215|19.44|1  |0  |3   |1   |
#'
#' ## Table 2
#' The second table includes the next four rows of the dataset.
#'
#' |model              |mpg |cyl|disp |hp |drat|wt   |qsec |vs |am |gear|carb|
#' |-------------------|----|---|-----|---|----|-----|-----|---|---|----|----|
#' |Hornet Sportabout  |18.7|8  |360  |175|3.15|3.44 |17.02|0  |0  |3   |2   |
#' |Valiant            |18.1|6  |225  |105|2.76|3.46 |20.22|1  |0  |3   |1   |
#' |Duster 360         |14.3|8  |360  |245|3.21|3.57 |15.84|0  |0  |3   |4   |
#' |Merc 240D          |24.4|4  |146.7|62 |3.69|3.19 |20   |1  |0  |4   |2   |
#'
#' ## Tables 3 and 4
#' The last two tables contain four and six rows, respectively.
#'
#' |model              |mpg |cyl|disp |hp |drat|wt   |qsec |vs |am |gear|carb|
#' |-------------------|----|---|-----|---|----|-----|-----|---|---|----|----|
#' |Cadillac Fleetwood |10.4|8  |472  |205|2.93|5.25 |17.98|0  |0  |3   |4   |
#' |Lincoln Continental|10.4|8  |460  |215|3   |5.424|17.82|0  |0  |3   |4   |
#' |Chrysler Imperial  |14.7|8  |440  |230|3.23|5.345|17.42|0  |0  |3   |4   |
#' |Fiat 128           |32.4|4  |78.7 |66 |4.08|2.2  |19.47|1  |1  |4   |1   |
#'
#' |model              |mpg |cyl|disp |hp |drat|wt   |qsec |vs |am |gear|carb|
#' |-------------------|----|---|-----|---|----|-----|-----|---|---|----|----|
#' |Porsche 914-2      |26  |4  |120.3|91 |4.43|2.14 |16.7 |0  |1  |5   |2   |
#' |Lotus Europa       |30.4|4  |95.1 |113|3.77|1.513|16.9 |1  |1  |5   |2   |
#' |Ford Pantera L     |15.8|8  |351  |264|4.22|3.17 |14.5 |0  |1  |5   |4   |
#' |Ferrari Dino       |19.7|6  |145  |175|3.62|2.77 |15.5 |0  |1  |5   |6   |
#' |Maserati Bora      |15  |8  |301  |335|3.54|3.57 |14.6 |0  |1  |5   |8   |
#' |Volvo 142E         |21.4|4  |121  |109|4.11|2.78 |18.6 |1  |1  |4   |2   |
#'
#' # Conclusion
#' These four markdown tables contain the classic `mtcars` dataset."
#'
#' # Extract tables from the markdown file
#' tables <- extract_md_tables(md, show_col_types = FALSE)
#'
#' # Display the 2nd table in the list
#' tables[[2]]
#' @export
extract_md_tables <- function(file, ...) {
  content <- source_file(file) |>
    (\(x) stringr::str_split(x, "\n")[[1]])() |>
    sapply(trimws) |>
    paste(collapse = "\n")

  tables <- match_md_tables(content)
  if (is.null(tables)) {
    cli::cli_abort(
      c(
        "x" = "Content in provided `file` does not match markdown table regex",
        "i" = paste("If the content is indeed a markdown table, or close",
                    "enough, try using `read_md_table`.")
      )
    )
  }

  safe_read_md_table_content <- purrr::safely(
    read_md_table_content,
    quiet = TRUE
  )

  table_tibbles <- purrr::map(tables, function(table) {
    table_tibble <- safe_read_md_table_content(table, ...)
    return(table_tibble$result)
  })

  if (length(table_tibbles) == 1) {
    return(table_tibbles[[1]])
  }

  return(table_tibbles)
}


#' @rdname extract_md_tables
#' @export
extract_md_table <- extract_md_tables

Try the readMDTable package in your browser

Any scripts or data that you put into this service are public.

readMDTable documentation built on June 8, 2025, 1:29 p.m.