Home

/

GitHub

/

In jsta/wikilake: Scrape Lake Metadata Tables from Wikipedia

library(wikilake)

Generate list of Michigan Lakes

Get Wikipedia URL of Category

res <- WikipediR::page_info("en", "wikipedia",
  page = "Category:Lakes of Michigan")

Scrape lake names

res <- xml2::read_html(res$query$pages[[1]]$canonicalurl)
res <- rvest::html_nodes(res, "#mw-pages .mw-category")
res <- rvest::html_nodes(res, "li")
res <- rvest::html_nodes(res, "a")
res <- rvest::html_attr(res, "title")

Remove junk names

res <- res[!(seq_len(length(res)) %in% grep("List", res))]
res <- res[!(seq_len(length(res)) %in% grep("Watershed", res))]
res <- res[!(seq_len(length(res)) %in% grep("lakes", res))]
res <- res[!(seq_len(length(res)) %in% grep("Mud Lake", res))]

Scrape tables

res <- lapply(res, lake_wiki)

# remove missing coordinates
res <- res[unlist(lapply(res, function(x) !is.na(x[, "Lat"])))]

Collapse list to `data.frame`

res_df_names <- unique(unlist(lapply(res, names)))
res_df <- data.frame(matrix(NA, nrow = length(res),
  ncol = length(res_df_names)))
names(res_df) <- res_df_names
for (i in seq_len(length(res))) {
  dt_pad <- data.frame(matrix(NA, nrow = 1,
    ncol = length(res_df_names) - ncol(res[[i]])))
  names(dt_pad) <- res_df_names[!(res_df_names %in% names(res[[i]]))]
  dt <- cbind(res[[i]], dt_pad)
  dt <- dt[, res_df_names]
  res_df[i, ] <- dt
}

good_cols <- data.frame(as.numeric(as.character(apply(res_df,
  2, function(x) sum(!is.na(x))))))
good_cols <- cbind(good_cols, names(res_df))
good_cols <- good_cols[good_cols[, 1] > 20, 2]
good_cols <- as.character(good_cols)

res_df <- res_df[, good_cols]

data(milakes)
res_df <- milakes

Map lakes

library(sp)

coordinates(res_df) <- ~ Lon + Lat
map("state", region = "michigan", mar = c(0, 0, 0, 0))
points(res_df, col = "red", pch = 19)

hist(log(res_df$`Max. depth`), main = "", xlab = "Max depth (log(m))")

jsta/wikilake documentation built on Jan. 10, 2023, 8:35 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jsta/wikilake
Scrape Lake Metadata Tables from Wikipedia

In jsta/wikilake: Scrape Lake Metadata Tables from Wikipedia

Generate list of Michigan Lakes

Get Wikipedia URL of Category

Scrape lake names

Remove junk names

Scrape tables

Collapse list to `data.frame`

Map lakes

R Package Documentation

Browse R Packages

We want your feedback!

jsta/wikilake Scrape Lake Metadata Tables from Wikipedia

In jsta/wikilake: Scrape Lake Metadata Tables from Wikipedia

Generate list of Michigan Lakes

Get Wikipedia URL of Category

Scrape lake names

Remove junk names

Scrape tables

Collapse list to data.frame

Map lakes

R Package Documentation

Browse R Packages

We want your feedback!

jsta/wikilake
Scrape Lake Metadata Tables from Wikipedia

Collapse list to `data.frame`