In this article we'll walk through how to create various types of maps of the observations downloaded with naturecounts to get a sense of the spatial distribution.

The following examples use the "testuser" user which is not available to you. You can quickly sign up for a free account of your own to access and play around with these examples. Simply replace testuser with your own username.

Setup

To do so we're going to use the following packages:

library(naturecounts)
library(sf)
library(rnaturalearth)
library(ggplot2)
library(ggspatial)
library(dplyr)
library(tidyr)
library(mapview)

First we'll use download some data:

house_finches <- nc_data_dl(species = 20350, region = list(statprov = "AB"), 
                            username = "testuser", info = "nc_tutorial")
head(house_finches)

Simple Maps

Here we're going to take a quick look at the spatial distribution using ggplot2 and ggspatial to get some basemaps.

First let's get an idea of how many distinct points there are (often multiple observations are recorded for the same location).

nrow(house_finches)

select(house_finches, longitude, latitude) %>%
  distinct() %>%
  nrow()

So we have r nrow(unique(house_finches[, c("longitude", "latitude")])) sites for r nrow(house_finches) observations.

Next let's convert our data to spatial data so we can plot it spatially (i.e. make a map!). Note that we're using CRS EPSG code of 4326 because that's reflects unprojected, GPS data in lat/lon. First we omit NAs because sf data frames cannot have missing locations.

house_finches <- drop_na(house_finches, "longitude", "latitude")
house_finches_sf <- st_as_sf(house_finches, 
                             coords = c("longitude", "latitude"), crs = 4326)

Now we're ready to make a map of the distribution of observations.

We'll use a baselayer from OpenStreetMap and then add our observations.

ggplot() +
  annotation_map_tile(type = "osm", zoomin = 0) +
  geom_sf(data = house_finches_sf) +
  labs(caption = "Copyright OpenStreetMap contributors")

Let's count our observations for each site.

cnt <- house_finches_sf %>%
  count(geometry)

ggplot() +
  annotation_map_tile(type = "osm", zoomin = 0) +
  geom_sf(data = cnt, aes(size = n)) +
  labs(caption = "Copyright OpenStreetMap contributors")

Interactive Maps

If we want to get fancy we can also create interactive maps using the mapview packages (see also the leaflet for R package).

mapview(house_finches_sf, zcol = "survey_year", at = seq(1965, 2005, by = 10),
        map.types = "Esri.WorldImagery")

More Complex Maps

For more complex, or detailed maps, we can use a variety of spatial data files to layer our data over maps of the area.

For this we'll get some outlines of Canada and it's Provinces and Territories from rnaturalearth.

canada <- ne_states(country = "canada", returnclass = "sf") %>%
  st_transform(3347)

ggplot() +
  theme_bw() +
  geom_sf(data = canada)

Let's add our observations (note that the data are transformed to match the projection of the first layer, here the canada data).

ggplot() +
  theme_bw() +
  geom_sf(data = canada) +
  geom_sf(data = house_finches_sf, size = 0.5)

We can also focus on Alberta

ab <- filter(canada, name == "Alberta")

ggplot() +
  theme_bw() +
  geom_sf(data = ab) +
  geom_sf(data = house_finches_sf, size = 0.5)

Perhaps we should see how many of these observations were made in parks.

First we'll download and extract the Park shapefiles available from the Alberta Parks website.

dir.create("alberta_parks") # Create a folder for the data
download.file(
  url = "https://www.albertaparks.ca/media/2941843/parks_and_protected_areas_alberta.zip", 
  destfile = "alberta_parks/parks_protected_areas_alberta.zip")

unzip("alberta_parks/parks_protected_areas_alberta.zip", exdir = "alberta_parks")

parks <- st_read("alberta_parks/Parks_Protected_Areas_Alberta.shp")

Add this layer to our plot.

ggplot() +
  theme_bw() +
  geom_sf(data = ab) +
  geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") +
  geom_sf(data = house_finches_sf, size = 0.5)

Well it's actually a bit difficult to tell, there are lots of small parks!

To solve this problem, we can merge our observations with the parks and plot those inside parks separately from those outside parks.

First we'll transform our observation data to match the CRS of parks, then we'll join the park information to our observations, based on whether the observations overlap a park polygon (by default this is a left join), and finally we'll create a new column outside_park that is a category for out or in the park, based on whether the observation was joined to a park name (OC_NAME).

house_finches_sf <- house_finches_sf %>%
  st_transform(st_crs(parks)) %>%
  st_join(parks) %>%
  mutate(outside_park = if_else(is.na(OC_NAME), "Outside Park", "Inside Park"))

And now we can see that there are quite a few, if not more, observations outside of parks than in.

ggplot() +
  theme_bw() +
  geom_sf(data = ab) +
  geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") +
  geom_sf(data = house_finches_sf, size = 1) +
  facet_wrap(~outside_park)

We might also be interested in observations over time.

First we'll bin our yearly observations

house_finches_sf <- mutate(house_finches_sf, 
                           years = cut(survey_year, 
                                       breaks = seq(1960, 2010, 10), 
                                       labels = seq(1960, 2000, 10), right = FALSE))

We'll also want to see how many sample years there are per decade.

years <- house_finches_sf %>%
  group_by(years) %>%
  summarize(n = length(unique(survey_year)), .groups = "drop")

Now we can see how House Finch observations change over the years

ggplot() +
  theme_bw() +
  geom_sf(data = ab) +
  geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") +
  geom_sf(data = house_finches_sf, size = 1.5) +
  geom_sf_text(data = years, x = 4427134, y = 2965275, hjust = 0, vjust = 1, 
               aes(label = paste0("n = ", n))) +
  facet_wrap(~years)

Presence/Absence

We can also use some of the naturecounts helper functions to create presence/absence maps.

Here we download data from the RCBIOTABASE collection, make sure to keep only observations where all species and the location were reported, create a new presence column which is either TRUE, FALSE, or NA for each sampling event. Finally we use the format_zero_fill() function to fill in sampling events where cardinals (species_id 19360) were not detected (presence would then be 0).

cardinals <- nc_data_dl(collection = "RCBIOTABASE", username = "testuser", 
                        info = "nc_tutorial")

cardinals_zf <- cardinals %>%
  filter(AllSpeciesReported == "Yes", !is.na(latitude), !is.na(longitude)) %>%
  group_by(species_id, AllSpeciesReported, SamplingEventIdentifier) %>%
  summarize(presence = sum(as.numeric(ObservationCount)) > 0, .groups = "drop") %>%
  format_zero_fill(species = 19360, by = "SamplingEventIdentifier",
                   fill = "presence")

A bit convoluted, but here we'll grab coordinates for each Sampling Event. This is only necessary if there are errors (worth reporting!) where there are more than one lat/lon combo for each sampling event.

coords <- cardinals %>%
  select("SamplingEventIdentifier", "latitude", "longitude") %>%
  group_by(SamplingEventIdentifier, latitude, longitude) %>%
  mutate(n = n()) %>%  # Count number of unique coordinates
  group_by(SamplingEventIdentifier) %>%
  slice_max(n, with_ties = FALSE) %>% # Grab the most common coordinates for each event
  select(-"n")

cardinals_zf <- left_join(cardinals_zf, coords, by = "SamplingEventIdentifier")

head(cardinals_zf)

Now that we have our presence/absence data for cardinals, we can create a map.

cnt <- st_as_sf(cardinals_zf, coords = c("longitude", "latitude"), crs = 4326) %>%
  group_by(presence) %>%
  count(geometry)

ggplot() +
  annotation_map_tile(type = "osm", zoomin = 0) +
  geom_sf(data = cnt, aes(size = n, colour = factor(presence)), alpha = 0.5) +
  scale_colour_manual(name = "Presence/Absence", values = c("#31688E", "#440154"), 
                      labels = c("1" = "Present", "0" = "Absent")) +
  scale_size_continuous(name = "Number of Sampling Events", range = c(1, 20)) +
  labs(caption = "Copyright OpenStreetMap contributors",
       title = "Presence/Absence of Cardinals in the RCBIOTABASE collection")

Clean up the mapping files if you no longer need them

unlink("alberta_parks/", recursive = TRUE)

See Also



BirdStudiesCanada/rNatureCounts documentation built on July 3, 2023, 2:06 a.m.