parse_table: Flexibly parse HTML table contents into a data frame

Description Usage Arguments Value Examples

View source: R/parse_table.R

Description

Extract data from HTML tables using custom functions. This enables the user to extract text, much as one would with rvest::html_table, but also allows more complex extraction of HTML element attributes (href, src, etc.), raw HTML, and more.

Usage

1
parse_table(table_node, cell_fn = xml2::xml_text, ...)

Arguments

table_node

an HTML table in an object of class "xml_node".

cell_fn

a function, e.g. function_x or anonymous function(x) ….

...

additional arguments to be passed to cell_fn.

Value

A data frame of list columns.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
library(xml2)
url <- "https://en.wikipedia.org/wiki/Political_party_strength_in_Michigan"
html_doc <- read_html(url)
wikitable_elements <- xml_find_all(html_doc,
                                   "//table[contains(@class, 'wikitable')]")
wikitable_list <- lapply(
    wikitable_elements,
    parse_table,
    cell_fn = function(.) {
        xml_attr(xml_find_first(., ".//a"), "href")
    }
)

gershomtripp/gttoolkit documentation built on Dec. 20, 2021, 10:41 a.m.