html_getTables: Find all tables in an html page

Description Usage Arguments Value Examples

View source: R/html_getTables.R

Description

Parses an html page to extract all <table> elements and return them in a list of dataframes representing each table. The columns and rows of these dataframes are that of the table it represents. A single table can be extracted as a dataframe by passing the index of the table in addition to the url to html_getTable().

Usage

1
2
3
html_getTables(url = NULL, header = NA)

html_getTable(url = NULL, header = NA, index = 1)

Arguments

url

URL or file path of an html page.

header

Use first row as header? If NA, will use first row if it consists of <th> tags.

index

Index identifying which table to to return.

Value

A list of dataframes representing each table on a html page.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
library(MazamaCoreUtils)

# Fail gracefully if the resource is not available
try({

  # Wikipedia's list of timezones
  url <- "http://en.wikipedia.org/wiki/List_of_tz_database_time_zones"

  # Extract tables
  tables <- html_getTables(url)

  # Extract the first table
  # NOTE: Analogous to firstTable <- html_getTable(url, index = 1)
  firstTable <- tables[[1]]

  head(firstTable)
  nrow(firstTable)

}, silent = TRUE)

MazamaCoreUtils documentation built on Nov. 12, 2021, 1:07 a.m.