In this article we'll walk through how to create various types of maps of the observations downloaded with naturecounts
to get a sense of the spatial distribution.
The following examples use the "testuser" user which is not available to you. You can quickly sign up for a free account of your own to access and play around with these examples. Simply replace
testuser
with your own username.
To do so we're going to use the following packages:
library(naturecounts) library(sf) library(rnaturalearth) library(ggplot2) library(ggspatial) library(dplyr) library(tidyr) library(mapview)
First we'll use download some data:
house_finches <- nc_data_dl(species = 20350, region = list(statprov = "AB"), username = "testuser", info = "nc_tutorial") head(house_finches)
Here we're going to take a quick look at the spatial distribution using ggplot2
and ggspatial
to get some basemaps.
First let's get an idea of how many distinct points there are (often multiple observations are recorded for the same location).
nrow(house_finches) select(house_finches, longitude, latitude) %>% distinct() %>% nrow()
So we have r nrow(unique(house_finches[, c("longitude", "latitude")]))
sites for r nrow(house_finches)
observations.
Next let's convert our data to spatial data so we can plot it spatially (i.e. make a map!).
Note that we're using CRS EPSG code of 4326 because that's reflects unprojected, GPS data in lat/lon.
First we omit NA
s because sf
data frames cannot have missing locations.
house_finches <- drop_na(house_finches, "longitude", "latitude") house_finches_sf <- st_as_sf(house_finches, coords = c("longitude", "latitude"), crs = 4326)
Now we're ready to make a map of the distribution of observations.
We'll use a baselayer from OpenStreetMap and then add our observations.
ggplot() + annotation_map_tile(type = "osm", zoomin = 0) + geom_sf(data = house_finches_sf) + labs(caption = "Copyright OpenStreetMap contributors")
Let's count our observations for each site.
cnt <- house_finches_sf %>% count(geometry) ggplot() + annotation_map_tile(type = "osm", zoomin = 0) + geom_sf(data = cnt, aes(size = n)) + labs(caption = "Copyright OpenStreetMap contributors")
If we want to get fancy we can also create interactive maps using the mapview
packages (see also the leaflet
for R package).
mapview(house_finches_sf, zcol = "survey_year", at = seq(1965, 2005, by = 10), map.types = "Esri.WorldImagery")
For more complex, or detailed maps, we can use a variety of spatial data files to layer our data over maps of the area.
For this we'll get some outlines of Canada and it's Provinces and Territories from rnaturalearth
.
canada <- ne_states(country = "canada", returnclass = "sf") %>% st_transform(3347) ggplot() + theme_bw() + geom_sf(data = canada)
Let's add our observations (note that the data are transformed to match the projection of the first layer, here the canada
data).
ggplot() + theme_bw() + geom_sf(data = canada) + geom_sf(data = house_finches_sf, size = 0.5)
We can also focus on Alberta
ab <- filter(canada, name == "Alberta") ggplot() + theme_bw() + geom_sf(data = ab) + geom_sf(data = house_finches_sf, size = 0.5)
Perhaps we should see how many of these observations were made in parks.
First we'll download and extract the Park shapefiles available from the Alberta Parks website.
dir.create("alberta_parks") # Create a folder for the data download.file( url = "https://www.albertaparks.ca/media/2941843/parks_and_protected_areas_alberta.zip", destfile = "alberta_parks/parks_protected_areas_alberta.zip") unzip("alberta_parks/parks_protected_areas_alberta.zip", exdir = "alberta_parks") parks <- st_read("alberta_parks/Parks_Protected_Areas_Alberta.shp")
Add this layer to our plot.
ggplot() + theme_bw() + geom_sf(data = ab) + geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") + geom_sf(data = house_finches_sf, size = 0.5)
Well it's actually a bit difficult to tell, there are lots of small parks!
To solve this problem, we can merge our observations with the parks and plot those inside parks separately from those outside parks.
First we'll transform our observation data to match the CRS of parks
, then we'll join the park information to our observations, based on whether the observations overlap a park polygon (by default this is a left join), and finally we'll create a new column outside_park
that is a category for out or in the park, based on whether the observation was joined to a park name (OC_NAME
).
house_finches_sf <- house_finches_sf %>% st_transform(st_crs(parks)) %>% st_join(parks) %>% mutate(outside_park = if_else(is.na(OC_NAME), "Outside Park", "Inside Park"))
And now we can see that there are quite a few, if not more, observations outside of parks than in.
ggplot() + theme_bw() + geom_sf(data = ab) + geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") + geom_sf(data = house_finches_sf, size = 1) + facet_wrap(~outside_park)
We might also be interested in observations over time.
First we'll bin our yearly observations
house_finches_sf <- mutate(house_finches_sf, years = cut(survey_year, breaks = seq(1960, 2010, 10), labels = seq(1960, 2000, 10), right = FALSE))
We'll also want to see how many sample years there are per decade.
years <- house_finches_sf %>% group_by(years) %>% summarize(n = length(unique(survey_year)), .groups = "drop")
Now we can see how House Finch observations change over the years
ggplot() + theme_bw() + geom_sf(data = ab) + geom_sf(data = parks, colour = "darkgreen", fill = "forestgreen") + geom_sf(data = house_finches_sf, size = 1.5) + geom_sf_text(data = years, x = 4427134, y = 2965275, hjust = 0, vjust = 1, aes(label = paste0("n = ", n))) + facet_wrap(~years)
We can also use some of the naturecounts
helper functions to create presence/absence maps.
Here we download data from the RCBIOTABASE
collection, make sure to keep only observations where all species and the location were reported, create a new presence
column which is either TRUE, FALSE, or NA for each sampling event. Finally we use the format_zero_fill()
function to fill in sampling events where cardinals (species_id
19360) were not detected (presence would then be 0).
cardinals <- nc_data_dl(collection = "RCBIOTABASE", username = "testuser", info = "nc_tutorial") cardinals_zf <- cardinals %>% filter(AllSpeciesReported == "Yes", !is.na(latitude), !is.na(longitude)) %>% group_by(species_id, AllSpeciesReported, SamplingEventIdentifier) %>% summarize(presence = sum(as.numeric(ObservationCount)) > 0, .groups = "drop") %>% format_zero_fill(species = 19360, by = "SamplingEventIdentifier", fill = "presence")
A bit convoluted, but here we'll grab coordinates for each Sampling Event. This is only necessary if there are errors (worth reporting!) where there are more than one lat/lon combo for each sampling event.
coords <- cardinals %>% select("SamplingEventIdentifier", "latitude", "longitude") %>% group_by(SamplingEventIdentifier, latitude, longitude) %>% mutate(n = n()) %>% # Count number of unique coordinates group_by(SamplingEventIdentifier) %>% slice_max(n, with_ties = FALSE) %>% # Grab the most common coordinates for each event select(-"n") cardinals_zf <- left_join(cardinals_zf, coords, by = "SamplingEventIdentifier") head(cardinals_zf)
Now that we have our presence/absence data for cardinals, we can create a map.
cnt <- st_as_sf(cardinals_zf, coords = c("longitude", "latitude"), crs = 4326) %>% group_by(presence) %>% count(geometry) ggplot() + annotation_map_tile(type = "osm", zoomin = 0) + geom_sf(data = cnt, aes(size = n, colour = factor(presence)), alpha = 0.5) + scale_colour_manual(name = "Presence/Absence", values = c("#31688E", "#440154"), labels = c("1" = "Present", "0" = "Absent")) + scale_size_continuous(name = "Number of Sampling Events", range = c(1, 20)) + labs(caption = "Copyright OpenStreetMap contributors", title = "Presence/Absence of Cardinals in the RCBIOTABASE collection")
unlink("alberta_parks/", recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.