wiki_table: Scrape a table from Wikipedia page

Description Usage Arguments Value Examples

View source: R/parse_table.R

Description

A function to parse and extract an HTML table from a Wikipedia page.

Usage

1
2
3
wiki_table(page, n = 1, header_length = "auto", skip = "auto",
  col_names = NULL, rm_header_text = NULL, rm_brackets = TRUE,
  rm_parens = FALSE, delay = 1)

Arguments

page

Either the url of a Wikipedia, or an object that contains a Wikipedia page

n

An integer that specifies which table to get data from. Defaults to 1, which retrieves the first table element in the HTML object or web page passed.

header_length

Set to a number greater than one to deal with multi-row headers. Takes an integer and defaults to 1.

skip

The number of rows to skip before collecting data. This is useful for omitting full-width "title" cells. Takes an integery and defaults to 0.

col_names

Optional argument that takes a character vector to name columns in the output table.

delay

Rate at which to throttle calls. There is no delay if the function is passed an HTML object (e.g. from wiki_page). Defaults to 1, can be turned off by setting to 0. Time between calls is determined by multiplying the value of this parameter with the response time by the server.

exclude_brackets

Whether to exclude brackets and their contents in output. Takes a boolean and defaults to TRUE.

exclude_parens

Whether to exclude parenthesis and their contents in output. Takes a boolean and defaults to FALSE

Value

Returns a dataframe (tibble) that contains the data from the table specified by the argument n.

Examples

1
2
wiki_table("https://wikipedia.org/wiki/List_of_metro_systems")
wiki_table("List_of_metro_systems")

niedermansam/wikiScraper documentation built on Nov. 4, 2019, 10:06 p.m.