This is a legacy version of Mapinguari - for the current version, go to gabrielhoc.github.io/Mapinguari

It is well documented that climate change has severe effects on biodiversity. Species distribution models or SDMs are popular tools for predicting individual species distributions and project the effects of climate change on those distributions, under different scenarios. Most SDMs are correlative and do not take into account biological processes underlying species responses to environmental variables. We aim to provide a modeling tool to fulfill this gap by estimating geographical layers with biologically relevant information that can be used in SDMs or other biogeographical analysis, as well as stimulating good practices in dealing with spatial autocorrelation, predictor selection, model evaluation, consensus and comparison.

Mapinguari is a package for program R aimed at providing tools to facilitate the incorporation of biological processes in biogeographical analyses. It offers conveniences in fitting, comparing and extrapolating models of biological processes such as physiology, phenology, adaptation, interspecific interactions, demography and dispersal. These spatial extrapolations can be informative by themselves, but also complement traditional correlative SDM methods, by mixing environmental and process-based predictors.

On this manual I will provide some examples of the kind of information that can be generated with Mapinguari, using a terrestrial ectotherm and processes relevant to those animals as examples, such as temperature limited activity times and locomotor performance. The spatial information that can be generated with the package is not limited to those presented here, and the same principles apply to generating surfaces relevant to other taxa, such as metabolic rates, time inside thermal neutral zone, photosyntesis, seed and egg development rates, etc.

Installation

The package is currently hosted on GitHub, this is how to install and load it:

# You need package devtools to install packages from GitHub
library(devtools)

install_github("gabrielhoc/MapinguariLegacy")

library(MapinguariLegacy)

Next, we are going to give an example of how to use the package, using some simulated data:

The fate of Fulanus

Fulanus beltranus is a fictional species of lizard from Democratic Republic of Congo. We would like to find out how this species is going to do in different climate change scenarios. First let's look at the points where it is currently known to occurr. To access the table with distribution data FulanusDistribution, package Mapinguari has to be loaded. Here we use ggmap to plot the points over a image from Google Maps (copyright Google).

# First, let's take a look at FulanusDistribution

head(FulanusDistribution)
# Load ggmap and ggplot2 to make the map
library(ggmap)
library(ggplot2)

Fulanus_bbox <- make_bbox(lat = Lat, 
  lon = Lon, 
  data = FulanusDistribution,
  f = 0.2)

Fulanus_big <- get_map(location = Fulanus_bbox, 
  source = "google", 
  maptype = "terrain")

ggmap(Fulanus_big) +
  geom_point(data = FulanusDistribution, 
    mapping = aes(x = Lon, y = Lat), 
    size = 1, 
    colour = 'red')

Cleaning points

Some points might be too close to each other and be redundant, while other might be mistakenly placed in regions where the species is known not to occurr. Fulanus occurrs only in land and in altitudes below 1000 meters. Let's filter out points that are within 2 kilometers from each other, outside this altitude range or on the ocean. To get a altitude layer, we can use function get_rasters, which will be explained in further detail in the next section.

# First, we need an altitude raster.

alt_list <- get_rasters(
     raster_source = "/Volumes/Podocnemis/Gabriel/Dropbox/data/rasters/worldclim/global_rasters_10min",
     ext = FulanusDistribution,
     #non_fixed_var = c('prec', 'tmin', 'tmax'), # make this optional
     fixed_var = 'alt')

# Get the layer from the list
alt <- alt_list[[1]]

# Than, we clean the points

FulanusDistribution_clean <- 
 clean_points(coord = FulanusDistribution,
   merge_dist = 2,
   reference_layer = alt < 1000 & !is.na(alt), # Oceans are represented by NA on this altitude layer
   layer_filter = 0)

head(FulanusDistribution_clean)

Let's plot those points on the map.

library(ggmap)

ggmap(Fulanus_big) +
  geom_point(data = FulanusDistribution_clean, 
    mapping = aes(x = Lon, y = Lat), 
    size = 1, 
    colour = 'red')

Now, let's get some information on the climate of this region.

Obtaining rasters

We can use function get_rasters to extract WorldClim surfaces for that area. The argument ext tells us which is the area you desire to crop your rasters around. It accepts either a numerical vector with the coordinates of cropping limits (in order: western most longitude, easter most longitude, southern most latitude then northern most latitude), or a table of longitudes (column named Lon) and latitudes (column named Lat), such as the FulanusDistribution_clean table we created previously. In this case, the function will grab the coordinates of the most extreme points and use those as the cropping limits. The argument margin adds to these limits, and it is on the same unit as the coordinates you supply.

library(raster)

FulanusEcoRasters_present <-
   get_rasters(
     raster_source = "/Volumes/Podocnemis/Gabriel/Dropbox/data/rasters/worldclim/global_rasters_10min",
     ext = FulanusDistribution_clean,
     margin = 5,
     non_fixed_var = c('prec', 'tmin', 'tmax'),
     fixed_var = 'alt',
     years = c("present"),
     reorder = TRUE)

plot(FulanusEcoRasters_present[[1]])

get_rasters is able to download rasters from WorldClim if you leave raster_source blank, but you probably don't want to download them every time you are executing the function. That's why the argument raster_source will also take a list of rasters from your workspace or a path to a folder containing those rasters. These options also have the advantage of being able to input any raster you want, not being limited to the ones you can download from WorldClim.

However, in order for Mapinguari to recognize which variables correspond to each year or scenario, you have to name your folders and list elements following the convention variable_year_scenario. The default character for separating those terms is "_", but you can change that, as long as you do the corresponding change to the argument separator in the function. Some of your rasters will not be subject to scenarios, since they are measures of current climate, instead of projections. In that case you only have to name their folders variable_year, omitting the scenario term. The argument baseline identifies which years are not subject to scenarios. The default for this argument is present, but it can be changed at your convenience. Some other variables are constant accross the time considered, such as altitude. In that case you only have to write the name of the variable. Here is an example of how my folder is structured:

I'm going to assign the path to that directory to an object:

my_directory <- "/Volumes/Podocnemis/Gabriel/Dropbox/data/rasters/worldclim/global_rasters_10min"

Why did I place some variables in the non-fixed argument and others in the fixed argument? Variables on non_fixed are things such as climate, which you expect to change in different times and future scenarios, while variables in fixed are things such as geological features, which you expect to remain constant accross the time projected.

Another important argument for non-fixed variables is reorder, which will take the last two characters of your RasterLayer names, replace letters with zeros and order the layers in ascending order of those numbers. This is useful because of the way some of the WorldClim layers are named when you download them, which when stacked will be placed out of chronological order. This argument fixes that. But be careful if you are using layers with different names than the ones from WorldClim. If they are in the correct order when you stack them, reorder could scramble them, so set it to FALSE.

Let's try getting some projections for the future years of 2050 and 2070:

FulanusEcoRasters_future <-
   get_rasters(
     raster_source = my_directory,
     ext = FulanusDistribution_clean,
     margin = 5,
     non_fixed_var = c('prec', 'tmin', 'tmax'),
     years = c('2050', '2070'),
     scenarios = c('rcp45', 'rcp85'),
     reorder = TRUE)

plot(FulanusEcoRasters_future$`2050_rcp45`)

As you can see, we have multiple rasters for each variable, one for each month. When doing biogeographical analysis, such as species distribution models, it is common to need a summary raster for each variable, such as the annual average or total. Function transform_rasters can help us to get summaries for the whole year or for parts of the year.

Summarizing time-varying rasters

Function transform_rasters allows us to apply any vectorized function to any subset of rasters for each variable, so we can get measures like averages, standard deviations, totals, variance or any other summary function for the month layers. Summaries have the advantage of reducing computation time, as they allows us to work with less layers. The first argument is raster_stack, which is where we supply the RasterStack with the layers for the calculations. The second argument, FUN_qlist, is where we supply the functions to be applied, the variables to apply the functions to, as well as the subset of the layers for that variable that will be included in the calculation. This expressions have to be inside a qlist, which is a list that returns its arguments unevaluated. In order for the function to find the layers, the layer names must begin with the same variable name used in the FUN_qlist expressions, before the separator character.

Fulanus_present_year_summaries <-
   transform_rasters(FulanusEcoRasters_present$present,
    FUN_qlist = qlist(tmax_average_year = mean(tmax),
                      tmin_average_year = mean(tmin),
                      prec_total_year = sum(prec)))

# The function operates in RasterStacks, if you want to apply it to a list, you can use `lapply`:

Fulanus_future_year_summaries <-   
lapply(FulanusEcoRasters_future, transform_rasters,
    FUN_qlist = qlist(tmax_average_year = mean(tmax),
                      tmin_average_year = mean(tmin)))

# You can also specify different subsets of the year to get, for example, summaries for seasons of the year, by using double square brackets.

Fulanus_present_two_seasons_summaries <-
   transform_rasters(FulanusEcoRasters_present$present,
    FUN_qlist = qlist(tmax_average_breeding = mean(tmax[[3:8]]),
                      tmax_average_growing = mean(tmax[[c(7:12, 1:3)]])))

plot(Fulanus_present_year_summaries$tmax_average_year)
plot(Fulanus_present_two_seasons_summaries$tmax_average_breeding)
plot(Fulanus_present_two_seasons_summaries$tmax_average_growing)

Summary functions are an useful example, but we can apply more complex functions, such as models using the spatial variables you have, in order to create new variables. But before we create those layers, we must fit the models in question. Function fit_curves allows us to fit models, compare them and get a vectorized function which returns the prediction of the model given the necessary variables.

Fitting physiological models

Function fit_curves can be used to compare models with different parameterizations or specifications for your biological processes. The user needs to inform a list of models in argument models and the function will output a table with their AIC, BIC, log likelihood, delta AIC, delta BIC and a rank for AIC and BIC values. The main goal of the function, however, is to create vectorized predict functions for each model, which can then be used in transform_models to spatially extrapolate the model. In order to make the spatial extrapolation possible, you must have spatial information for at least one of the variables on the right hand side of your model formula. The vectorized function generated can be used to get predictions of your model in contexts other than spatial, such as for time series or specific values.

Back to Fulanus, we measured the maximum running speed of several individuals under different temperatures. We also recorded their body sizes, since that variable is likely to inffluence their performance. Here is how the data look like:

head(FulanusPhysiology)

Note that we also attributed a number to each lizard to keep track of which lizard did which trial. That is important because, since the same lizard ran different trials, the data points are not independent and we have to account for that when building our model.

Argument models can be a single model or a list of models. In this case we are fitting GAMM models, which can account for the autocorrelation from running several tests with each individual. You can name the models by putting their names on the list. Most model algorithms that have a method for function predict should work.

library(mgcv)

# Here is an example without naming the models. They will be assigned a generic name.
perf_functions_no_name <-
  fit_curves(gamm(performance ~ s(temp, bs = 'cs') + size, 
                  random = list(id = ~ 1), 
                  data = FulanusPhysiology))

# It is easier to keep track if you name them something meaningful (TPC means thermal performance curve). Also, you can fit multiple models at the same time.
perf_functions <-
  fit_curves(list(tpc_gamm_size = gamm(performance ~ s(temp, bs = 'cs') + size, 
                                       random = list(id = ~ 1), 
                                       data = FulanusPhysiology),
                  tpc_gamm_no_size = gamm(performance ~ s(temp, bs = 'cs'), 
                                          random = list(id = ~ 1), 
                                          data = FulanusPhysiology)))

The function prints a table containing statistics on each model, for comparison, and outputs a list containing that table, as well as a sub list for each model, containing the predictor function, raw model output and inputed arguments.

perf_functions$stats
perf_functions$tpc_gamm_size$predict
perf_functions$tpc_gamm_size$model
perf_functions$tpc_gamm_size$output

You can assign the predictor function to a name, which can be applied to any value.

my_tpc <- perf_functions$tpc_gamm_size$predict
my_tpc(temp = 20, size = 3)
# You can apply it to a vector
my_tpc(temp = 20:40, size = 3)

The predictor function can be applied to rasters of the variables on the right hand side of the model formula to spatialize the variable on the left hand side. On the next section we are going to give an example.

Spatialize a physiological model

transform_rasters can do the link between your physiological model and your spatial environmental data. The predictor functions obtained in fit_curves can be used in the same way we used the summary functions in the previous examples. This interface also allows us to set values for terms in the model, such as estimating performance for animals of specific sizes, in the thermal performance model we fitted before.

Note, however, that the variables on the rasters used need to be on the same scale as the data you used to fit the models. In this case, the raster temperatures are multiplied by 10, so we have to fix that first.

# You can calculate the performance for minimum and maximum temperatures, by changing the term on the expression inside FUN_qlist

# We are dividing the temperatures by 10, so they are on the same scale as the data used to fit the models

Perf_rasters_tmax <-
  transform_rasters(raster_stack = Fulanus_present_year_summaries,
                    FUN_qlist = qlist(perf_tmax = my_tpc(tmax/10, size = mean(FulanusPhysiology$size))))

Perf_rasters_tmin <-
  transform_rasters(raster_stack = Fulanus_present_year_summaries,
                    FUN_qlist = qlist(perf_tmin = my_tpc(tmin/10, size = mean(FulanusPhysiology$size))))

# We can get the performance for individuals of different ages by changing the `size` argument on the my_tpc function (lizards' ages are highly correlated with size)

Perf_rasters_young <-
  transform_rasters(raster_stack = Fulanus_present_year_summaries,
                    FUN_qlist = qlist(perf_young = my_tpc(tmax/10, size = min(FulanusPhysiology$size))))

Perf_rasters_old <-
  transform_rasters(raster_stack = Fulanus_present_year_summaries,
                    FUN_qlist = qlist(perf_old = my_tpc(tmax/10, size = max(FulanusPhysiology$size))))

plot(Perf_rasters_tmax$perf_tmax)
plot(Perf_rasters_tmin$perf_tmin)
plot(Perf_rasters_young$perf_young)
plot(Perf_rasters_old$perf_old)

The function operates in RasterStacks, if you want to apply it to a list, you can use lapply:

Perf_rasters_list <- lapply(Fulanus_future_year_summaries, transform_rasters,
  FUN_qlist = qlist(perf_tmin = my_tpc(tmin/10, size = mean(FulanusPhysiology$size))))

Calculating hours of activity

To make the performance raster we extrapolated a generalized additive mixed model (GAMM). We can use the same functions fit_curves and transform_rasters to create rasters from other kinds of modelling algorithms, such as non-least squares (NLS). The amount of time lizards are able to stay active seems to be a good predictor of their ability to persist in a locality (Sinervo et al, 2010). Here we are going to demonstrate how to estimate average daily hours of activity limited by temperature. The same framework could be applied to estimate amount of time inside temperature ranges important for other organisms, such as the thermal neutral zone of mammals or temperatures relevant to seed and egg development.

In order to know how much time Fulanus lizards are able to perform their activities, first we need to know the temperature range in which they are active. Measuring the temperature of active lizards in the field, we found that they were active with body temperatures between 16 and 28 degrees Celsius. Next, we have to gather data on how temprature varies in the microhabitats those lizards use. We deployed physical models with similar thermal properties to the lizards in the microhabitats lizards were observed performing their activities. Those models were connected to data loggers that registered their temperature every four minutes. We also recorded the air temperature at a nearby weather station for every day the models were out. The table FulanusMicroclimate contains this (simulated) data.

head(FulanusMicroclimate)

Next, we are going to create a new column in the table with a binary variable, indicating if at each moment, the temperature registered for the models was inside the activity range for Fulanus or not.

# Summarise microclimate table by day
#make this into a function

library(dplyr)

# Fulanus only has activity on microhabitats between 16 and 28 degrees celsius

FulanusMicroclimate$HA <- ifelse(FulanusMicroclimate$temp > 16 & FulanusMicroclimate$temp < 28, 1, 0)

If we sum HA for each day, we are going to get the amount of time the models were inside that temperature on each day.

# First, let's summarise the data by day

resolution_minutes <- 4 # data is logged every 4 minutes

microclimate_by_day <- 
group_by(FulanusMicroclimate, day, month, year) %>% 
summarise(HA = sum(HA)/60/resolution_minutes, # converting to hours
          temp_micro_avg = mean(temp),
          temp_air_avg = mean(t_air),
          Lon = unique(Lon), 
          Lat = unique(Lat))

head(microclimate_by_day)

Then we can relate HA to the temperature measured at the weather station, which is equivalent to the temperature at the WorldClim rasters we have already. This assumes microhabitats are similar accross the species range. Hours of activity should be bound between two values, zero and length of the day, so logistic models would be appropriate for fitting this. We are going to use package FlexParamCurve, which allows us to fit many variations of a logistic models, such as Gompertz, Von Bertalannfy and Richards curves.

library(FlexParamCurve)

# Set options for Gompertz curve

modpar(x = microclimate_by_day$temp_air_avg, 
       y = microclimate_by_day$HA, 
       pn.options = "G_options")

change.pnparameters(M = 0.1, pn.options = "G_options")

# Set options for Von Bertanlannfy curve

modpar(x = microclimate_by_day$temp_air_avg, 
       y = microclimate_by_day$HA, 
       pn.options = "VB_options")

change.pnparameters(M = -0.3, pn.options = "VB_options")

# Set options for Richards curve

modpar(x = microclimate_by_day$temp_air_avg, 
       y = microclimate_by_day$HA, 
       pn.options = "R_options")

HAlogistic <- nls(HA ~ SSlogis(temp_air_avg, Asym, xmid, scal), data = microclimate_by_day) # A simple logistic

HAGompertz <- nls(HA ~ SSposnegRichards(temp_air_avg, Asym, K, Infl, modno = 32, pn.options = "G_options"), data = microclimate_by_day)

HAVonBertalannfy <- nls(HA ~ SSposnegRichards(temp_air_avg, Asym, K, Infl, modno = 32, pn.options = "VB_options"), data = microclimate_by_day)

HARichards <- nls(HA ~ SSposnegRichards(temp_air_avg, Asym, K, Infl, M, modno = 12, pn.options = "R_options"), data = microclimate_by_day)

 HA_curve <- 
   fit_curves(list(logistic = HAlogistic, 
                   Gompertz = HAGompertz,
                   Von_Bertalannfy = HAVonBertalannfy,
                   Richards = HARichards), 
              predict_formals = "temp_air_avg",
              separator = "_")

It seems the Von Bertalannfy curve was the best one.

HA_VB <- HA_curve$Von_Bertalannfy$predict

HA_raster_present <-
  transform_rasters(raster_stack = Fulanus_present_year_summaries,
    FUN_qlist = qlist(HA = HA_VB(tmax/10)))

plot(HA_raster_present)

Other environmental raster

Mapinguari can generate other potentially useful rasters that are not species specific, such as hydrologic variables like potential evapotranspiration (PET), actual evapotranspiration (AEt) and climatic water deficit.

In order to calculate AET, we need the PET layers in monthly resolution, so we are using the monthly layers instead of the year averages.

prec_stack <- FulanusEcoRasters_present$present[[1:12]]
tmin_stack <- FulanusEcoRasters_present$present[[13:24]]
tmax_stack <- FulanusEcoRasters_present$present[[25:36]]
alt_stack <- FulanusEcoRasters_present$present$alt

# Potential EvapoTranspiration
PET_stack <-
PETFUN(tmax = tmax_stack/10, 
       tmin = tmin_stack/10, 
       alt = alt_stack)

plot(mean(PET_stack))
# Actual EvapoTranspiration
AET_stack <-
AETFUN(PET = PET_stack, prec = prec_stack)

plot(mean(AET_stack))
# Climatic Water Deficit

CWD_stack <- PET_stack - AET_stack
names(CWD_stack) <- paste("CWD", 1:12, sep = "_")

plot(mean(CWD_stack))

You can also get day length values for the area, by inputing any raster for the area desired.

daylength_stack <- daylengthFUN(CWD_stack)

plot(mean(daylength_stack))

Another useful variable is the amount of Solar Radiation, in kiloJoules by square meter by day.

srad_stack <- sradFUN(alt_stack, tmax_stack/10, tmin_stack/10)

plot(mean(srad_stack))

Including microclimate variation on our physiological variables

Previously, we estimated hours of activity assuming homogenous microhabitats. However, microhabitats, and thus microclimates, are likely to change according to the structure of the vegetation. Many vegetation indexes, such as EVI and NDVI are based on satellite images, and therefore, are hard to extrapolate for the future. Luckily for us, AET is a good approximation of vegetation distribution. We can incorporate this information on our hours of activity models.

# let's include a new column in the table with AET
microclimate_by_day$AET <- extract(mean(AET_stack), data.frame(Lon = microclimate_by_day$Lon, Lat = microclimate_by_day$Lat))

# Let's try fitting a curve with AET

HAlogisticAET <- try(nls(HA ~ SSlogis(temp_air_avg + AET, Asym, xmid, scal), data = microclimate_by_day))

HAGompertzAET <- try(nls(HA ~ SSposnegRichards(temp_air_avg + AET, Asym, K, Infl, modno = 32, pn.options = "G_options"), data = microclimate_by_day))

HAVonBertalannfyAET <- try(nls(HA ~ SSposnegRichards(temp_air_avg + AET, Asym, K, Infl, modno = 32, pn.options = "VB_options"), data = microclimate_by_day))

HARichardsAET <- try(nls(HA ~ SSposnegRichards(temp_air_avg + AET, Asym, K, Infl, M, modno = 12, pn.options = "R_options"), data = microclimate_by_day))

It seems that only the simple logistic and Gomperts were able to converge, so let's use those.

HA_AET_curve <- 
fit_curves(list(HAlogisticAET,
                HAGompertzAET),
  predict_formals = c("temp_air_avg", "AET"))

HAlogis <- HA_AET_curve$model_1$predict

# Let's join CWD to hour climatic stack

FulanusEcoRasters_present_AET <- FulanusEcoRasters_present
FulanusEcoRasters_present_AET$present <- stack(FulanusEcoRasters_present$present, AET_stack)

HA_raster_present <-
  transform_rasters(raster_stack = FulanusEcoRasters_present_AET$present,
    FUN_qlist = qlist(HA = HAlogis(tmax, AET)))

plot(mean(HA_raster_present))

Creating a function for the Sinervo method for hours of activity

Sometimes, you might want to create a function to use in transform_rasters that is not a prediction of a model you can fit with fit_curves. transform_rasters will take custom functions you make if they are vectorized. There are different ways to vectorize a function, like function Vectorize. We are going to make an example function, the Sinervo method for calculating hours of activity (Sinervo 2010), and use it to create a raster of hours of activity. This method creates a sin wave between the minimum and maximum temperatures for a location, simulating daily temperature variation, then counts how much time the temperatures are between the thresholds for activity for that species. This method was developed initially for hours of activity in lizards, but it can be applied for any relevant temperature thresholds, like the limits of the thermal neutral zone in mammals or temperature ranges important for seed or egg development.

# Let's create a function for the Sinervo method
# tmax is the maximum temperature at a location
# tmin is the minimum temperature at a location
# tlwr is the lower temperature threshold for activity
# tupr is the upper temperature threshold for activity
# res is the time resolution, i. e. how many parts of hours are being counted. If res = 3, hours of activity will be counted from 20 to 20 minutes (this will greatly affect the speed of the calculation).

SinervoHA <- function(tmax, tmin, tlwr, tupr, res) {

  s0 <- 1:res
  h0 <- 1:24

  s <- expand.grid(s0, h0)[[1]]
  h <- expand.grid(s0, h0)[[2]]

  day_temps <-
  ((tmax - tmin)/2 * sin((pi/12) * (h + (s/res)) - 3 * (pi/4))) + (tmax + tmin)/2

  sum(ifelse(day_temps > tlwr & day_temps < tupr, 1/res, 0))
}

# We can vectorize the function using function Vectorize

SinervoHA_vectorized <- Vectorize(SinervoHA)

SinervoHA(tmax = c(30, 40, 35), tmin = c(10, 20, 15), tlwr = 20, tupr = 30, res = 3)
SinervoHA_vectorized(tmax = c(30, 40, 35), tmin = c(20, 30, 25), tlwr = 20, tupr = 30, res = 3)

ha_sinervo_raster <- 
transform_rasters(raster_stack = Fulanus_present_year_summaries, 
                  FUN_qlist = qlist(SinervoHA_vectorized(tmax = tmax[[1]]/10, tmin = tmin[[1]]/10,  tlwr = 15, tupr = 25, res = 3)))

# Let's cap the hours of activity by daylength

average_daylength <- mean(daylength_stack)

ha_capped <-
overlay(ha_sinervo_raster, average_daylength, fun = function(x, y) ifelse(x < y, x, y))

# hours of restriction to activity
h_restriction <- average_daylength - ha_capped

plot(ha_sinervo_raster)
plot(ha_capped)
plot(h_restriction)

get performace considering microhabitat temperatures

We used WorldClim temperatures to estimate performance accross the distribution of our species. Those temperatures are more akin to those measured at weather station. A more precise way to do it might be using estimates of microhabitat temperature. We have microhabitat temperatures from our FulanusMicroclimate dataset, which we can use to create a model relating microhabitat temperature to air temperature and AET then extrapolate it.

microtemp_model <-
fit_curves(glm(temp_micro_avg ~ temp_air_avg + AET, data = microclimate_by_day))
predict_microtemp <- microtemp_model$model_1$predict

microtemp_raster_present <-
  transform_rasters(raster_stack = FulanusEcoRasters_present_AET$present,
    FUN_qlist = qlist(microtemp = predict_microtemp(temp_air_avg = tmax/10, AET = AET))) 

Perf_rasters_microtemp <-
  transform_rasters(raster_stack = microtemp_raster_present,
                    FUN_qlist = qlist(perf_microtemp = my_tpc(microtemp, 
                                                              size = mean(FulanusPhysiology$size))))

plot(mean(Perf_rasters_microtemp))

Summarizing accross spatially varying periods (geographical variation in phenology)

Now, let's say the most crucial period for Fulanus population dynamics is their breeding period and they will breed only under certain climatic conditions. This might cause their breeding period to vary spatially and temporally. In order to account for that, we can fit a logistic model of breeding status vs climate, using data from the table FulanusBreeding, which contains data on breeding status and climate at different localities. Then we can use this model to create binary rasters with the locations where Fulanus is breeding at each month, using function transform_rasters. After that, we can summary the climatic rasters only during the breeding season at different places, by using the binary rasters as weights to average other variables. You can use this method to merge any raster according to a spatial varying criterium. This can be used, for example, to merge rasters of locally adapted physiology according to the distribution of populations.

Let's take a look at the table.

head(FulanusBreeding)

Now we fit the model of breeding season.

Phen_model <- fit_curves(models = glm(breeding ~ prec, family = binomial(link = 'logit'), data = FulanusBreeding))

PhenFUN1 <- Phen_model$model_1$predict

And then we extrapolate it.

season_rain_rasters <-
  transform_rasters(raster_stack = FulanusEcoRasters_present$present,
                    FUN_qlist = qlist(season_rain = PhenFUN1(prec)))

It makes sense for us to convert this to binary now, but leaving it as proportions would also work.

# Let's convert the raster to binary

season_rain_rasters_binary <-
  calc(season_rain_rasters, function(x) ifelse(x < 0.5, 0, 1))

FulanusEcoRasters_season_rain <-
    transform_rasters(FulanusEcoRasters_present$present,
      FUN_qlist = qlist(weighted.mean(tmax, weights = season_rain_rasters_binary)))

# Length of Breeding season
plot(sum(season_rain_rasters))

# Average Performance during breeding season
plot(FulanusEcoRasters_season_rain)

The function can also be a condition. Let's say we are only interested on the periods when precipitation is bigger than 150.

PhenFUN2 <- function(x) ifelse(x > 150, 1, 0)

season_prec_rasters <-
  transform_rasters(raster_stack = FulanusEcoRasters_present$present,
    FUN_qlist = qlist(season_rain = PhenFUN2(prec)))

FulanusEcoRasters_season_prec <-
    transform_rasters(FulanusEcoRasters_present$present,
      FUN_qlist = qlist(weighted.mean(tmax, weights = season_prec_rasters)))

# Length of Breeding season
plot(sum(season_prec_rasters))

# Average Performance during breeding season
plot(FulanusEcoRasters_season_prec)

SDM (under development)

Colinearity test

# group desired variables in a stack

total_stack <- stack(Fulanus_present_year_summaries,
                             alt_stack,
                             sum(PET_stack),
                             sum(AET_stack),
                             sum(CWD_stack),
                             ha_capped,
                             h_restriction, 
                             mean(Perf_rasters_microtemp),
                             sum(season_rain_rasters))

names(total_stack) <- c("prec", "tmin", "tmax", "alt", "PET", "AET", "CWD", "HA", "HR", "perf", "breed")

library(car)
library(dismo)

# Select 1000 random points from mask
set.seed(22111962)
mask <- total_stack[[1]]
rnd.points <- randomPoints(mask, 1000)

# Principal Components Analysis (PCA) of environmental variables
env.data <- extract(total_stack, rnd.points)
pca.env.data <- princomp(env.data, cor = T)
plot(pca.env.data)
biplot(pca.env.data, pc.biplot = T)
summary(pca.env.data)
pca.env.data$loadings[, 1:3]
# Variance Inflation Factors (VIF) of environmental variables
library(usdm)
v1 <- vifcor(total_stack, th = 0.9) #correlation
v1
v2 <- vifstep(total_stack, th = 10) #VIF
v2

# Subset environmental stack
env.selected <- exclude(total_stack, v2) #exclude collinear variables identified with vifstep
env.selected

# Pairs plot of selected environmental predictors
env.data.std <- data.frame(scale(env.data)) # Scale variables
library(corrplot)
M <- cor(env.data.std[, names(env.data.std)])
corrplot.mixed(M, upper = "ellipse", lower = "number")

Generate presence absence table

pres_coords <- FulanusDistribution_clean[1:2]

pseudo_abs <- pseudoabsences(pres_coords, 150, 2000)

values_pres <- data.frame(unlist(extract(env.selected, pres_coords)))
values_abs <- data.frame(unlist(extract(env.selected, pseudo_abs)))

values_pres$presence <- 1
values_abs$presence <- 0

pres_abs_table <- rbind(values_pres, values_abs)

Make SDM

library(biomod2)

# Prepare data
FulanusBiomodData <- BIOMOD_FormatingData(
    resp.var = pres_abs_table$presence,
    expl.var = pres_abs_table[1:6],
    resp.xy = rbind(pres_coords, pseudo_abs),
    resp.name = "Fulanus")
FulanusBiomodData

# Path to MAXENT
myBiomodOption <- BIOMOD_ModelingOptions(MAXENT.Phillips = list(path_to_maxent.jar = "/Library/Frameworks/R.framework/Versions/3.3/Resources/library/dismo/java", memory_allocated = 1024))

# Parallel processing
library(doParallel)
cl <- makeCluster(2) # number of cores in computer
registerDoParallel(cl)

FulanusModelOut <- BIOMOD_Modeling(FulanusBiomodData,
    models = c("GBM","CTA","RF", "GLM","GAM","ANN","SRE","FDA","MARS","MAXENT.Phillips","MAXENT.Tsuruoka"),
    models.options = myBiomodOption,
    NbRunEval = 10,
    DataSplit = 75,
    Prevalence = NULL,
    VarImport = 0,
    models.eval.meth = c("KAPPA","TSS","ROC","ACCURACY","BIAS"),
    SaveObj = TRUE,
    rescal.all.models = FALSE,
    do.full.models = FALSE,
    modeling.id = "Fulanus")

Model evaluation

# Get evaluations
FulanusModelEval <- get_evaluations(FulanusModelOut)

# Print ROC scores of all selected models
FulanusModelEval["ROC","Testing.data",,,]

# Get summaries (mean) of model evaluation: one model (MAXENT.Tsuruoka), one method (TSS)
dimnames(FulanusModelEval)
MAXENT.Tsuruoka_Eval <- FulanusModelEval["TSS","Testing.data","MAXENT.Tsuruoka",,]
mean(MAXENT.Tsuruoka_Eval)

# Get summaries (mean) of model evaluation: one model (GLM), all methods
mean(FulanusModelEval["KAPPA","Testing.data","GLM",,])
mean(FulanusModelEval["TSS","Testing.data","GLM",,])
mean(FulanusModelEval["ROC","Testing.data","GLM",,])
mean(FulanusModelEval["ACCURACY","Testing.data","GLM",,])
mean(FulanusModelEval["BIAS","Testing.data","GLM",,])

# Get summaries (mean) of model evaluation: machine-learning models, all methods
sdm.models <- c("GBM","CTA","RF") #3 models
eval.methods <- c("KAPPA","TSS","ROC","ACCURACY","BIAS") #5 evaluation methods

means.i <- numeric(0)
means.j <- numeric(5)
for (i in 1:3) {
    for (j in 1:5) {
    means.j[j] <- mean(FulanusModelEval[paste(eval.methods[j]),"Testing.data", paste(sdm.models[i]),,])
    }
    means.i <- c(means.i, means.j)
}

summary.eval.equal <- data.frame(rep(sdm.models,each = 5), rep(eval.methods,3), means.i)
names(summary.eval.equal) <- c("Model", "Method", "Mean")
summary.eval.equal
xtabs(summary.eval.equal$Mean ~ summary.eval.equal$Model + summary.eval.equal$Method) #GBM with best performance

# Get summaries (mean) of model evaluation: regression models, all methods
sdm.models <- c("GLM","GAM","ANN","SRE","FDA","MARS","MAXENT.Phillips","MAXENT.Tsuruoka") #8 models
eval.methods <- c("KAPPA","TSS","ROC","ACCURACY","BIAS") #5 evaluation methods

means.i <- numeric(0)
means.j <- numeric(5)
for (i in 1:8) {
    for (j in 1:5) {
    means.j[j] <- mean(FulanusModelEval[paste(eval.methods[j]),"Testing.data",paste(sdm.models[i]),,], na.rm = T)
    }
    means.i <- c(means.i, means.j)
}

summary.eval.10000 <- data.frame(rep(sdm.models,each = 5), rep(eval.methods,8), means.i)
names(summary.eval.10000) <- c("Model", "Method", "Mean")
summary.eval.10000
xtabs(summary.eval.10000$Mean ~ summary.eval.10000$Model + summary.eval.10000$Method) #MARS with best performance

List of Mapinguari functions



gabrielhoc/MapinguariLegacy documentation built on May 8, 2019, 9:54 p.m.