knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Example

Let's use data from the Smithsonian Conservation Biology Institute (SCBI) ForestGEO plot. They present their data at https://scbi-forestgeo.github.io/SCBI-ForestGEO-Data/:

This is the public data portal for the SCBI ForestGEO plot, which points to archive locations for our various data products (some in this repository, many elsewhere).

SCBI datasets are scattered across multiple organizations and repositories. I'll use the ghr package to find and access some of those datasets, and the purrr package mostly to apply functions over multiple elements of a vector.

library(ghr)
library(purrr)

(I'll also use fs to manipulate paths, and readr to read files. I'll refer to functions from these packages using the syntax package::function().)

Climate

Climate data from SCBI is stored in the GitHub organization "forestgeo", particularly in the repository "Climate".

ghr_ls() lists GitHub directories in a way similar to how fs::dir_ls() lists local directories.

ghr_ls("forestgeo/Climate/Met_Station_Data/SCBI")

I can use a regular expression (regexp) to focus, for example, on .csv files.

# All files
ghr_ls("forestgeo/Climate/Met_Station_Data/SCBI/Front Royal weather station")

# .csv files
ghr_ls(
  "forestgeo/Climate/Met_Station_Data/SCBI/ForestGEO_met_station-SCBI", 
  regexp = "[.]csv$"
)

Species list

In addition to climate data, SCBI has a number of species-list datasets. These datasets are in the GitHub organization "SCBI-ForestGEO", in the "SCBI-ForestGEO-Data" repository, and in the "species_lists" folder.

species_lists <- "SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists"
# Get the last part of the path
(subdirs <- fs::path_file(ghr_ls(species_lists)))

Let's explore the .csv files in each sub directory. To reduce duplication I first create the vector paths to store the path to all of the sub directories I want to explore. Then I apply ghr_ls() to each element in paths and use regexp to focus on .csv files only.

(paths <- fs::path(species_lists, subdirs))

paths %>% 
  map(~ ghr_ls(.x, regexp = "[.]csv$"))

Instead of the file names I can show the first few rows of each file. I use ghr_ls_download_url() to get download URLs of each .csv file in each sub directory. To iterate over all sub directories I use purrr::map().

download_urls <- paths %>% 
  map(~ ghr_ls_download_url(.x, regexp = "[.]csv$"))
download_urls

And finally I use readr::read_csv() to read each dataset into R. Again I use purrr::map() to iterate over each sub directory.

download_urls %>% 
  unlist() %>% 
  map(~ head(readr::read_csv(.x)))


maurolepore/ghr documentation built on May 18, 2019, 12:26 p.m.