knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
wikitablr
is an R package that has the tools to simply webscrape tables from wikipedia, and clean for common formatting issues. The intention here is to empower beginners to explore data on practically any subject that interests them (as long as there's a wikipedia table on it), but anyone can utilize this package.
wikitablr
takes data that looks like this:
library(wikitablr) head(read_wiki_raw("https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles", 2))
and makes it look like this:
head(read_wiki_table("https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles", 2))
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("jkeast/wikitablr")
wikitablr
allows you to either read in and clean your code all at once:
#read in first table on page head(read_wiki_table("https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Massachusetts"))
Or use its cleaning functions seperately:
#read in first table from url without cleaning it example <- read_wiki_raw("https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_Massachusetts") head(example)
#clean names of table example <- clean_wiki_names(example) head(example)
#remove footnotes from table head(remove_footnotes(example))
wikitablr
also allows you to read in and clean all tables on a page at once, putting them into a list.
This is the first table on the wikipedia page "List of World Series champions":
example2 <- read_all_tables("https://en.wikipedia.org/wiki/List_of_World_Series_champions") head(example2[[1]])
and the rest:
head(example2[[2]]) head(example2[[3]]) head(example2[[4]])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.