knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Let's use data from the Smithsonian Conservation Biology Institute (SCBI) ForestGEO plot. They present their data at https://scbi-forestgeo.github.io/SCBI-ForestGEO-Data/:
This is the public data portal for the SCBI ForestGEO plot, which points to archive locations for our various data products (some in this repository, many elsewhere).
SCBI datasets are scattered across multiple organizations and repositories. I'll use the ghr package to find and access some of those datasets, and the purrr package mostly to apply functions over multiple elements of a vector.
library(ghr) library(purrr)
(I'll also use fs to manipulate paths, and readr to read files. I'll refer to functions from these packages using the syntax package::function()
.)
Climate data from SCBI is stored in the GitHub organization "forestgeo", particularly in the repository "Climate".
ghr_ls()
lists GitHub directories in a way similar to how fs::dir_ls()
lists local directories.
ghr_ls("forestgeo/Climate/Met_Station_Data/SCBI")
I can use a regular expression (regexp
) to focus, for example, on .csv files.
# All files ghr_ls("forestgeo/Climate/Met_Station_Data/SCBI/Front Royal weather station") # .csv files ghr_ls( "forestgeo/Climate/Met_Station_Data/SCBI/ForestGEO_met_station-SCBI", regexp = "[.]csv$" )
In addition to climate data, SCBI has a number of species-list datasets. These datasets are in the GitHub organization "SCBI-ForestGEO", in the "SCBI-ForestGEO-Data" repository, and in the "species_lists" folder.
species_lists <- "SCBI-ForestGEO/SCBI-ForestGEO-Data/species_lists" # Get the last part of the path (subdirs <- fs::path_file(ghr_ls(species_lists)))
Let's explore the .csv files in each sub directory. To reduce duplication I first create the vector paths
to store the path to all of the sub directories I want to explore. Then I apply ghr_ls()
to each element in paths
and use regexp
to focus on .csv files only.
(paths <- fs::path(species_lists, subdirs)) paths %>% map(~ ghr_ls(.x, regexp = "[.]csv$"))
Instead of the file names I can show the first few rows of each file. I use ghr_ls_download_url()
to get download URLs of each .csv file in each sub directory. To iterate over all sub directories I use purrr::map()
.
download_urls <- paths %>% map(~ ghr_ls_download_url(.x, regexp = "[.]csv$")) download_urls
And finally I use readr::read_csv()
to read each dataset into R. Again I use purrr::map()
to iterate over each sub directory.
download_urls %>% unlist() %>% map(~ head(readr::read_csv(.x)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.