library(sf)
library(dplyr)
library(ggplot2)
library(maps)
library(here)

knitr::opts_knit$set(root.dir = here())
source("R/util.R")
# Imports CNC Area.
cnc_area <- load_cnc_area()
# Filters cec_ecoregions for all areas in the same Level III ecoregions as the cnc_area.
ecoregions <- load_ecoregions()
shared <- st_intersection(ecoregions, cnc_area)
lvl2_ecoregions <- filter(ecoregions, LEVEL2 %in% shared$LEVEL2)
lvl3_ecoregions <- filter(ecoregions, LEVEL3 %in% shared$LEVEL3)
cnc_ecoregions <- filter(ecoregions, LEVEL2 %in%  shared$LEVEL2) 
lvl2_sf <- readRDS("data/lvl2.rds") %>% st_as_sf(coords = c("decimalLongitude", "decimalLatitude"),
                             crs = st_crs(ecoregions))

# Reads state map from map package and converts to sf object.
states <- st_as_sf(maps::map("state", plot = FALSE, fill = TRUE))

stopifnot(st_crs(states) == st_crs(cnc_ecoregions))
matrix <- st_within(lvl2_sf, cnc_area, sparse = TRUE)
lvl2_count <- st_within(lvl2_sf, lvl2_ecoregions, sparse = TRUE) %>% mclapply(FUN =  any) %>% sum()
lvl3_count <- st_within(lvl2_sf, lvl3_ecoregions, sparse = TRUE) %>% mclapply(FUN = any) %>% sum()

Data Import

This vignette will walk through the process by which we define the geographic boundaries of our dataset. The object of this first phase of the project is to create light-sensitivity distributions for all species observed on iNaturalist within the Boston CNC area (shown below). The geographic boundaries we select must fulfill two conditions.

states %>%
  filter(ID %in% c('massachusetts', 'connecticut', 'new hampshire',
                   'rhode island', 'new york', 'new jersey')) %>% 
  ggplot() +
  geom_sf() +
  geom_sf(data = cnc_area, fill = 'blue', alpha = 0.6) + 
  labs(title = "Boston City Nature Challenge Area") +
  theme_minimal()

Firstly, in order to create robust distributions, we require large sample sizes for each species. We will be filtering our data quite significantly (see cleaning vignette), and thus the total number of observations within the area will need to far greater than the desired final number of observations.

Secondly, it is preferable that the boundaries are based not on geo-political designations irrelevant to the ecology of the region (i.e. New England), but on boundaries defined by the life found within them. By using boundaries defined ecologically, we ensure that our expanded dataset (TODO: Why did we prefer ecological boundaries?).

The CNC area shown above contains (TODO: n observations), and is based not on ecological boundaries, thus failing to satisfy both our first and second conditions.

In order to satisfy our second condition of ecological boundaries, we utilized the North American terrestrial ecoregions defined by the Commission for Economic Cooperation (CEC). According to the CEC:

Ecoregions are areas of general similarity in ecosystems and in the type, quality, and quantity of environmental resources. The ecoregions in this data set are based on the premise that a >hierarchy of ecological regions can be identified through the analysis of the patterns and the composition of both living and nonliving phenomena, such as geology, physiography, vegetation, >climate, soils, land use, wildlife, and hydrology, that affect or reflect differences in ecosystem quality and integrity.

The three levels represent a hierarchical division of the continents ecological geography.

Level I is the coarsest level, dividing North America into 15 broad ecological regions. These highlight major ecological areas and provide the broad backdrop to the ecological mosaic of the >continent, putting it in context at global or intercontinental scales. The 50 level II North American ecological regions provide a more detailed description of the large ecological areas nested >within the level I regions and are useful for national and sub-continental overviews of ecological patterns. The 182 level III ecological regions, smaller ecological areas nested within level II >regions, enhance regional environmental monitoring, assessment and reporting, as well as decision-making.

Read here for more information.

The Boston CNC area is comprised of two Level III ecoregions, each part of a distinct Level II ecoregion, both of which are part of Level I ecoregion 8 "Eastern Temperate Forests":

8.1.7 Northeastern Coastal Zone 8.5.4 Atlantic Coastal Pine Barrens

states %>%
  filter(ID %in% c('massachusetts', 'connecticut', 'new hampshire',
                   'rhode island', 'new york', 'new jersey')) %>% 
  ggplot() +
  geom_sf() + 
  geom_sf(data = filter(lvl3_ecoregions),
          aes(color = factor(LEVEL3), 
              fill  = factor(LEVEL3)), alpha = 0.4) +
  geom_sf(data = cnc_area, fill = 'blue', alpha = 0.6) +
  labs(title = "Boston CNC Area within CEC defined Level III Ecoregions",
       caption = "Boston CNC area is in blue.",
       fill = "Level 3 Ecoregions") +
  scale_color_manual(values = c("red", "green"), guide = FALSE) + # Removes color legend.
  scale_fill_manual(values = c("red", "green")) +
  theme_minimal()

Both of these Level III ecoregions comprise two Level II ecoregions, 8.1 and 8.5, containing respectively TODO: n and n1 observations.

# Plots ecoregion and cnc_area over USA map. 
states %>%
  ggplot() +
  geom_sf() + 
  geom_sf(data = lvl2_ecoregions, aes(color = factor(LEVEL2), 
                                      fill  = factor(LEVEL2)), alpha = 0.4) +
  geom_sf(data = cnc_area, fill = 'blue', alpha = 0.6) +
  labs(title = "Boston CNC Area within CEC defined Level II Ecoregions",
       caption = "Boston CNC area is in blue.",
       fill = "Level 2 Ecoregions") +
  scale_color_manual(values = c("red", "green"), guide = FALSE) + # Removes color legend.
  scale_fill_manual(values = c("red", "green")) +
  theme_minimal()

In order to query gbif according to geographic limits, we must provide a bounding box to gbif. The bounding box simplifies the complex and expansive geography of the Level II ecoregions. But, the bounding box benefits our research in two way. Firstly, it significantly buffers our ecoregions, allowing us to capture observations that may not neatly fall inside these boundaries (plants and animals don't absolutely confine themselves to the distinct ecoregions). Secondly, it provides us with a greater sample size for our species light distributions. The bounding box below contains r nrows(lvl2) research grade iNaturalist observations, a much

# Creates bounding box object.
bbox <- st_bbox(lvl2_ecoregions)

bbox_sf <- st_as_sfc(bbox, # Converts WKT string to a tibble, then converts to simple geometry object.
                     crs = st_crs(states))

# Plots the bounding box used to query GBIF for our Level II Ecoregions.
states %>%
  ggplot() +
  geom_sf() + 
  geom_sf(data = lvl2_ecoregions, aes(color = factor(LEVEL2), 
                                      fill  = factor(LEVEL2)), alpha = 0.4) +
  geom_sf(data = cnc_area, fill = 'blue', alpha = 0.6) +
  labs(title = "Boston CNC Area within CEC defined Level II Ecoregions with GBIF Bounding Box",
       caption = "Boston CNC area is in blue.",
       fill = "Level 2 Ecoregions") +
  geom_sf(data = bbox_sf, alpha = 0.2, color = "black") +
  scale_color_manual(values = c("red", "green"), guide = FALSE) + # Removes color legend.
  scale_fill_manual(values = c("red", "green")) +
  theme_minimal()


iozeroff/cncpointR documentation built on Feb. 4, 2020, 6:18 p.m.