html_table: Parse an html table into a data frame

View source: R/table.R

html_tableR Documentation

Parse an html table into a data frame

Description

The algorithm mimics what a browser does, but repeats the values of merged cells in every cell that cover.

Usage

html_table(
  x,
  header = NA,
  trim = TRUE,
  fill = deprecated(),
  dec = ".",
  na.strings = "NA",
  convert = TRUE
)

Arguments

x

A document (from read_html()), node set (from html_elements()), node (from html_element()), or session (from session()).

header

Use first row as header? If NA, will use first row if it consists of ⁠<th>⁠ tags.

If TRUE, column names are left exactly as they are in the source document, which may require post-processing to generate a valid data frame.

trim

Remove leading and trailing whitespace within each cell?

fill

Deprecated - missing cells in tables are now always automatically filled with NA.

dec

The character used as decimal place marker.

na.strings

Character vector of values that will be converted to NA if convert is TRUE.

convert

If TRUE, will run type.convert() to interpret texts as integer, double, or NA.

Value

When applied to a single element, html_table() returns a single tibble. When applied to multiple elements or a document, html_table() returns a list of tibbles.

Examples

sample1 <- minimal_html("<table>
  <tr><th>Col A</th><th>Col B</th></tr>
  <tr><td>1</td><td>x</td></tr>
  <tr><td>4</td><td>y</td></tr>
  <tr><td>10</td><td>z</td></tr>
</table>")
sample1 %>%
  html_element("table") %>%
  html_table()

# Values in merged cells will be duplicated
sample2 <- minimal_html("<table>
  <tr><th>A</th><th>B</th><th>C</th></tr>
  <tr><td>1</td><td>2</td><td>3</td></tr>
  <tr><td colspan='2'>4</td><td>5</td></tr>
  <tr><td>6</td><td colspan='2'>7</td></tr>
</table>")
sample2 %>%
  html_element("table") %>%
  html_table()

# If a row is missing cells, they'll be filled with NAs
sample3 <- minimal_html("<table>
  <tr><th>A</th><th>B</th><th>C</th></tr>
  <tr><td colspan='2'>1</td><td>2</td></tr>
  <tr><td colspan='2'>3</td></tr>
  <tr><td>4</td></tr>
</table>")
sample3 %>%
  html_element("table") %>%
  html_table()

rvest documentation built on June 22, 2024, 10:47 a.m.