knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

wikitablr

Travis-CI Build Status

wikitablr is an R package that has the tools to simply webscrape tables from wikipedia, and clean for common formatting issues. The intention here is to empower beginners to explore data on practically any subject that interests them (as long as there's a wikipedia table on it), but anyone can utilize this package.

wikitablr takes data that looks like this:

library(wikitablr)
head(read_wiki_raw("https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles", 2))

and makes it look like this:

head(read_wiki_table("https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles", 2))

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("jkeast/wikitablr")

Example

wikitablr allows you to either read in and clean your code all at once:

#read in first table on page
head(read_wiki_table("https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Massachusetts"))

Or use its cleaning functions seperately:

#read in first table from url without cleaning it
example <- read_wiki_raw("https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Massachusetts")
head(example)
#clean names of table
example <- clean_wiki_names(example)

head(example)
#remove footnotes from table
head(remove_footnotes(example))

Read all tables

wikitablr also allows you to read in and clean all tables on a page at once, putting them into a list.

This is the first table on the wikipedia page "List of World Series champions":

example2 <- read_all_tables("https://en.wikipedia.org/wiki/List_of_World_Series_champions")

head(example2[[1]])

and the rest:

head(example2[[2]])

head(example2[[3]])

head(example2[[4]])


jkeast/wikitablr documentation built on March 7, 2020, 7:48 a.m.