Predicting Resource Suitability with rsMove"
In rsMove: Guidelines for the use of Remote Sensing in Movement Ecology

# load packages
require(rsMove)
require(raster)
require(rgdal)
require(ggplot2)
require(knitr)
require(kableExtra)
require(caret)
require(lattice)
require(igraph)
require(randomForest)
require(e1071)
require(randomForest)

A Remote Sensing Perspective on Resource Suitability Modelling

This tutorial is based on this paper which introduces a standardized, presence-absence, machine learning approach to map animal resource suitability and consistently validate its spatial predictions. This vignette shows users how to replicate the proposed approach using rsMove.

Data

For this exercise we will use movement data for one White Stork tracked with high-resolution GPS. Additionally, we will use a time-series of Normalized Difference Vegetation Index (NDVI) images as environmental predictors.

``` {r message=FALSE} data("shortMove") # movement data ndvi <- stack(list.files(system.file('extdata', '', package="rsMove"), 'ndvi.tif', full.names=TRUE)) # environmental predictors

</br>

### Objective
<p align="justify" style="line-height:200%;">
Since the animal has a fixed resting site, samples related to resting behavior are limited. As a consequence, we will focus on mapping resources related to feeding behavior.
</p>

</br>

### Movement Data Pre-Processing
<p align="justify" style="line-height:200%;">
As a first step, we need to remove redundant data points. As animals move through the landscape, they sometimes visit the same locations. Looking at our example data, this issue is frequent as a consequence of back-and-forth trips between the nest and the feeding sites.
</p>

```r
plot(calc(ndvi, max, na.rm=TRUE))
points(shortMove)

As a consequence, when we use movement data to sample from gridded environmental predictors, several samples will be replicated due to the lower temporal resolution of the remote sensing data. This pseudo-replication phenomena has two main implications. First, it offers misleading information on the preferences of a species. Since we can not monitor environmental conditions at the same temporal scale at which we can track animals we often miss environmental changes that influence animal decision making. Second, it becomes hard to consistently train and validate any predictive model since samples acquired from the same pixel can be used for both training and validation.

To address this issue, we will use `moveReduce()`. Considering a shapefile with movement data, this function aggregates temporally consecutive spatial points that fall within the same pixel and reports on the elapsed time within it. In addition, the user can choose to derive a `RasterLayer` with the total amount of time elapsed at each pixel. We will take advantage of this functionality to identify usable samples.

obs.time <- strptime(paste0(shortMove@data$date, ' ', shortMove@data$time), format="%Y/%m/%d %H:%M:%S") # format observation time
reduced.samples <- moveReduce(shortMove, ndvi, obs.time, derive.raster=TRUE) # remove redundant data points
plot(reduced.samples$total.time) # show total time image

Looking at the output image, we see one pixel standing out with about 400 minutes (i.e. roughly 7 hours). This is the nesting site. Since we are interested in feeding sites alone, we will filter this pixel. Moreover, we will filter all pixels with 0 minutes. These are pixels which did not record more than one consecutive GPS point suggesting that the animal did not stop within them. To do this, we will use the raster package to create a mask, identify the usable pixels and use them to build a new shapefile that will contain our presence samples. Note that the sample selection performed with `moveReduce()` is more informative when using high-resolution movement data. Otherwise, the time spent within a pixel might not be relevant. In this circunstances, we building a mask from all visited pixels. One way to do so would be to use the `rasterize()` function to identify all pixels that overlap with the movement data.

upper.limit <- quantile(reduced.samples$total.time, 0.95) # identify upper threshold using 95%-percentile
move.mask <- reduced.samples$total.time > 0 & reduced.samples$total.time < upper.limit # build sample mask
usable.pixels <- which.max(move.mask) # identify relevant pixels
presence.samples <- SpatialPoints(xyFromCell(move.mask, usable.pixels), proj4string=crs(shortMove)) # build shapefile from samples (presences)

Identify absence samples

In the previous steps we translated the shortMove dataset into samples that likely relate to feeding sites. However, to distinguish them from the rest of the landscape, we need to collect background samples that describe "unattractive" environmental conditions. While movement data is a strong asset to understand the environmental preferences of an animal it tells us little about its dislikes. Even if a location was not visited by an animal it does not mean it was unsuitable. Factors such as distance and accessibility can condition animal decision making leading it to neglect potentially suitable resources. Thus, random background sampling - a commonly used technique - can be misleading as it does not account for the abundance of suitable resources. To address this issue we developed backSample(). This function introduces a new sampling technique that uses presence samples as informants. These are used to collect samples from the remote sensing data and identify pixels where the environmental conditions are statistically different while preserving fuzzy borders between presences and background samples.

This approach is also sensitive to differences in resource selection. For example, let's consider our target species. While the White Stork often searches for prey in recently managed crop fields it can sometimes be found over grasslands and even wetlands. As a consequence, our presence samples can reflect a diverse set of environmental conditions. To account for this and to better define the boundaries between suitable and unsuitable environmental conditions, backSample() requires indices for each sample that reflect their spatial aggregation. This index is returned by labelSample() which aggregates the elements of SpatialPoints object based on their spatial neighborhood within a RasterLayer. In this exercise, we will aggregate samples within 60m of each other (i.e. 2 pixels).

sample.id <- labelSample(presence.samples, ndvi, agg.radius=60) # aggregate samples in space
absence.samples <- backSample(presence.samples, ndvi, sample.id, sampling.method="pca") # identify absence samples
absence.samples # show samples

Derive and Validate Predict Model

Similarly to the background sampling approach, the modelling of resource suitability relies on spatially independent samples for training and validation. The function iterates through each of the presence regions given by sample.id and uses the corresponding samples for validation while the remaining ones are used for training. During this process, a random set of absence samples is selected. Ideally, the function will try to try to equalize the number of presence samples in the training and validation sets. The final validation is an F1-score estimated from the total number of correct and false positives among all iterations. The function predictResources() can be used to perform the training and validation steps.

env.presences <- extract(ndvi, presence.samples) # extract environmental data for presences
env.absences <- extract(ndvi, absence.samples)  # extract environmental data for absences
resourceModel1 <- predictResources(env.presences, env.absences, sample.id, env.data=ndvi) # build model

Now let's have a look at our results. First, let's make a mask from the output probability map with a threshold of 0.5. Then let's overlap the presence samples.

plot(resourceModel1$probabilities >= 0.5) # probability map
points(presence.samples) # presences

As shown by the output, the model was able to identify one additional pixel not sampled by `presence.samples`. But how accurate is the output? To see that we can consult the F1-scores for presences and absences.

kable_styling(kable(head(resourceModel1$f1, 1), format="html", align="c", full_width=TRUE), "stripped", bootstrap_options="responsive")

The accuracies were comparatively higher for absences suggesting an unbalance betwene both classes. This indicates that the chosen environmental predictors might not be suitable to distinguish the selected resources from their surroundings. But what if we had used random background sampling instead of of the proposed approach?

absence.samples <- backSample(presence.samples, ndvi, sampling.method="random") # identify absence samples (random)
env.absences <- extract(ndvi, absence.samples)  # extract environmental data for absences
resourceModel2 <- predictResources(env.presences, env.absences, sample.id, env.data=ndvi) # build model

plot(resourceModel2$probabilities >= 0.5) # probability map
points(presence.samples) # presences
kable_styling(kable(head(resourceModel2$f1, 1), format="html", align="c", full_width=TRUE), "stripped", bootstrap_options="responsive")

As shown by the output, only the samples covered by *presence.samples* received a probability higher than 0.5. Additionally, the F1-score was *NaN* for presences suggesting that the independent presence regions failed to predict each other.

Plausibility Test

While modelling results might be satisfactory it is ideal to verify if they fit to our expectations, be them data driven or based on empirical observations of the target species. To assist on this, we developed plausibilityTest(). This function allows its user to compare presence-absence maps derived with different modelling approaches against existing categorical information such as land cover maps. Given a stack of masks - where 1 is the usable value - the function will iterate through each band and report on the sum of pixels for each class in a categorical layer. To test this tool, we can use the land cover data provided through rsMove.

``` {r message=FALSE} landCover <- raster(system.file('extdata', 'landCover.tif', package="rsMove"))

<p align="justify" style="line-height:200%;">
Now let's apply the function using the probability maps derived with *pca* and *random* sampling considering only probabilities higher than 0.5. We will also specify the class labels.
</p>

``` {r message=TRUE}
class.labels <- c("Arable land", "Land without use", "Open spaces", "Wetlands", "Permanent crops", "Extraction/Dump sites", "Industrial areas", "Green urban areas")
probMask <- stack(resourceModel1$probabilities> 0.5, resourceModel2$probabilities> 0.5) # stack of probabilities (pca and random)
ptest <- plausibilityTest(probMask, landCover, class.labels=class.labels)

ptest$relative.plot
kable_styling(kable(head(ptest$relative.count, 8), format="html", align="c", full_width=TRUE), "stripped", bootstrap_options="responsive")

The output suggests very similar results between both sampling approaches. Most of the pixels with probabilities higher than 0.5 were related to arable land while non-vegetated land cover types as well as *Permanent Crops* were ignored. Considering that the White Stork is reportedly attracted by agriculture, the output of `plausibilityTest()` suggests we built a reasonable predictive model.