In statnmap/SDMSelect: Cross-validation model selection and species distribution mapping

knitr::opts_chunk$set(
  echo = TRUE, 
  message = FALSE,
  warning = FALSE,
  fig.show = 'asis',
  out.width = '75%',
  fig.align = 'center',
  fig.width = 7,
  fig.height = 7
)

Outlines

This library has been created for model selection to predict species distribution (Biomass, density or presence/absence). Its final aim is to produce maps of predicted distributions. However, the core is a k-fold cross-validation model selection procedure that can be applied to any kind of model, provided that parameters of the selection are well defined. We use the classical cars dataset to show the different possibilities of the library. The variable of interest is mpg. This is a continuous numerical variable, only positive. This is in accordance with our data modelling case datatype = "ContPosNull" (continuous positive or null) and the model types associated (modeltypes). For other types of data, refer to the documentation (in particular, options in modelselect_opt) to correctly set your modelling parameters.
Note that most figures of the vignette are saved in "inst" so that model selection is not run during vignette building. However, code in the vignettes can be run on your own computer and should return the same outputs. To do so, open and run this Rmd file: r system.file("Covar_Selection", "Covar_Selection.Rmd", package = "SDMSelect").

Load libraries and other setup

library(SDMSelect)
set.seed(50)

# Temp directory for saving all outputs
tmpdir <- paste0(tempdir(), "/out_CovarSelection")
dir.create(tmpdir)

Dataset

data <- dplyr::mutate_at(mtcars, 8:11, as.character)

knitr::kable(head(data))

Modify the dataset to work with the library

In this case, this only consists in changing observation column name to dataY and append factor column names with factor_ for compatibility with following functions.

data.new <- Prepare_dataset(
  x = data, var = 1, cov = 2:ncol(data),
  datatype = "Cont", na.rm = TRUE
)

knitr::kable(head(data.new))

Covariates correlation

Due to identifiability issues, highly correlated environmental covariates should not be included in the same models. Correlations between all possible pairs of covariates are tested using the Spearman's rho coefficient. Here, a rho value exceeding 0.7 considers covariates too correlated to be included in the same model. However, the following cross-validation procedure also guarantees too correlated covariates not to be included in the same models as gain in prediction may be low. Thus, the present covariates correlation selection step mainly allows the number of tested models to be reduced.

corSpearman <- Param_corr(
  x = data.new, rm = 1, thd = 0.7, visual = FALSE,
  plot = TRUE, saveWD = tmpdir, img.size = 5)

knitr::include_graphics(file.path(tmpdir, "Covariate_correlation_crop.png"))

Find the best combination of covariates

Set options for presence-absence models. Have a look at options documentation ?modelselect_opt.

modelselect_opt(RESET = TRUE)
modelselect_opt$Max_nb_Var <- 5
modelselect_opt$datatype <- "ContPosNull"
modelselect_opt$modeltypes <- modelselect_opt$modeltypes[c(1, 5, 11)]

Procedure selects combination of covariates in each iteration from one covariate to the maximum defined (modelselect_opt("Max_nb_Var") = r modelselect_opt("Max_nb_Var")). This reproduces the procedure separately for all model types defined (modelselect_opt("modeltypes") = r paste(modelselect_opt("modeltypes"), collapse = ", ")). Here, for the example, I only run the model for three "modeltypes" (r paste(modelselect_opt$modeltypes[c(1, 5, 11)], collapse = ", ")).
The output of findBestModel function is the link to a zipped file of all outputs saved in the saveWD directory.

res.file <- findBestModel(
  x = data.new, datatype = "ContPosNull", 
  corSpearman = corSpearman, saveWD = tmpdir, 
  verbose = 1)

Order models according to quality of prediction

Model selection was realised separately for each distribution tested. The exact same k-fold cross-validation datasets have been used to keep the best model at each step of the iteration procedure. All indices of goodness of fit can thus be compared among the distributions tested with paired statistical tests. This allows to order all models tested.

BestModels <- ModelOrder(saveWD = res.file, plot = TRUE)
# To add in /inst for vignettes
readr::write_rds(BestModels, "BestModels.rds")

Two tables are available:

"VeryBestModels_crossV" retains the best model and the ones not significantly worse than the first one.

knitr::kable(BestModels$VeryBestModels_crossV, row.names = FALSE)

"BestForModeltypes" retains the first two best models of each "modeltype".

knitr::kable(BestModels$BestForModeltypes, row.names = FALSE)

Predictions of the best model

We can choose one model, the best one here, and create a set of figure outputs to explore what it is predicting. Figures are saved in saveWD and its compressed version (its path is the output of the function, here stored in res.file)

Num.Best <- BestModels$VeryBestModels_crossV$Num[1]
res.file <- ModelResults(saveWD = tmpdir, plot = TRUE, 
                 Num = Num.Best, Marginals = TRUE)

In the case of a model with positive values as here, the outputs are the following. If the best model is with a log-transformation, figures are mainly created in the log-scale

Analysis of variance. Deviance explained by each covariate when added sequentially in the order specified by the cross-validation procedure. "%Exp.Dev" stands for percentage of explained deviance. "RMSE" is the root mean squared error on the validation datasets as issued from the cross-validation procedure, ">Diff" being the difference of RMSE with the previous step.

knitr::include_graphics(paste0(tmpdir, "/", basename(tmpdir), "-Param_Exp_Deviance_", Num.Best, ".png"))

# Save in inst
file.copy(paste0(tmpdir, "/", basename(tmpdir), "-Param_Exp_Deviance_", Num.Best, ".png"), "Param_Exp_Deviance.png", overwrite = TRUE)

Residuals analysis

knitr::include_graphics(paste0(tmpdir, "/", basename(tmpdir), "-Residual-Analysis_", Num.Best, ".png"))

# Save in inst
file.copy(paste0(tmpdir, "/", basename(tmpdir), "-Residual-Analysis_", Num.Best, ".png"), "Residual-Analysis.png", overwrite = TRUE)

Marginal predictions by covariates. These are simplified marginal effects of each covariate in the range of observations. All covariates values except one have been fixed so that their combination give a prediction close to the mean of observations. These simplified marginals figures are to be read for their relative effect and not for their absolute prediction.

knitr::include_graphics(paste0(tmpdir, "/", basename(tmpdir), "-CovForMeanPred_Marginals_", Num.Best, ".png"))

# Save in inst
file.copy(paste0(tmpdir, "/", basename(tmpdir), "-CovForMeanPred_Marginals_", Num.Best, ".png"),
          "CovForMeanPred_Marginals.png", overwrite = TRUE)

Comparison of predictions against observations

knitr::include_graphics(paste0(tmpdir, "/", basename(tmpdir), "-Obs-Pred_", Num.Best, ".png"))

# Save in inst
file.copy(paste0(tmpdir, "/", basename(tmpdir), "-Obs-Pred_", Num.Best, ".png"),
          "Obs-Pred.png", overwrite = TRUE)

statnmap/SDMSelect documentation built on April 1, 2021, 2:01 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

statnmap/SDMSelect
Cross-validation model selection and species distribution mapping

In statnmap/SDMSelect: Cross-validation model selection and species distribution mapping

Outlines

Load libraries and other setup

Dataset

Modify the dataset to work with the library

Covariates correlation

Find the best combination of covariates

Order models according to quality of prediction

Predictions of the best model

R Package Documentation

Browse R Packages

We want your feedback!

statnmap/SDMSelect Cross-validation model selection and species distribution mapping

In statnmap/SDMSelect: Cross-validation model selection and species distribution mapping

Outlines

Load libraries and other setup

Dataset

Modify the dataset to work with the library

Covariates correlation

Find the best combination of covariates

Order models according to quality of prediction

Predictions of the best model

R Package Documentation

Browse R Packages

We want your feedback!

statnmap/SDMSelect
Cross-validation model selection and species distribution mapping