2. Species Distribution Modeling : ENphylo_modeling and ENphylo_prediction"

if (!requireNamespace("rmarkdown", quietly = TRUE) ||
     !rmarkdown::pandoc_available()) {
   warning(call. = FALSE, "Pandoc not found, the vignettes is not built")
   knitr::knit_exit()
}

misspacks<-sapply(c("rnaturalearth","ggplot2","viridis","ggtext"),requireNamespace,quietly=TRUE)
if(any(!misspacks)){
  warning(call. = FALSE,paste(names(misspacks)[which(!misspacks)],collapse=", "), "not found, the vignettes is not built")
   knitr::knit_exit()
}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align='center',
  fig.width=6,
  fig.height=4,
  warning=FALSE,
  message=FALSE,
  out.width='95%',
  dpi=300
)

require(RRgeo)
require(rnaturalearth)
require(ggplot2)
require(ggtext)
require(terra)
require(sf)
require(ape)
require(viridis)

options(rmarkdown.html_vignette.check_title = FALSE)
load("enphylo_vignette.Rda")
# read.tree("Eucopdata_tree.txt")->tree
read.tree(file.path(dirname(getwd()),"inst","exdata","Eucopdata_tree.txt"))->tree
rast(file.path(dirname(getwd()),"inst","exdata","X35kya.tif"))->map35
rast(file.path(dirname(getwd()),"inst","exdata","Suit_Vulpes.tif"))->map1
rast(file.path(dirname(getwd()),"inst","exdata","Suit_Ursus.tif"))->map2
as.data.frame(map1,xy=T)->map1
as.data.frame(map2,xy=T)->map2

Index

  1. Introduction
  2. Formatting the data
  3. ENphylo_modeling
  4. ENphylo_prediction

1. Introduction {#intro}

The ENphylo_modeling function is meant to couple Environmental Niche Factor Analysis (ENFA) and phylogenetic imputation (ENphylo) to model the spatial distribution of rare (Mondanaro et al. 2023) and extremely rare (i.e. 1-5 occurrences) species (Mondanaro et al. 2024). By tuning the arguments in ENphylo_modeling, the user can refine the model validation process and evaluate the performance of both ENFA and ENphylo models. In the following lines, we provide a detailed example demonstrating how to format the data for running ENphylo_modeling, along with a description of its key arguments.

2. Formatting the data {#format}

The main input object of ENphylo_modeling, the argument input_data, is a list of sf::data.frame objects containing species occurrence data in binary format (ones for presence, zero for background points) along with the explanatory continuous variables.

Important note: before launching ENphylo_modeling, please ensure that all non-explanatory variables are removed from input_data and make sure that each element of the list is named using the names of the target species.

To demonstrate how to prepare input_data, we load a list of 15 geopackage files, generated by means of [eucop_data_preparation], as a case study. We chose four random bioclimatic variables as relevant for the target species ("bio1", "bio4", "bio11", and "bio19"), thus removed all others from the data, and assigned species names to the input_data list. If the user generated the geopackages by means of eucop_data_preparation as shown in the first step of this tutorial, they should run the unevaluated lines in the following chunk.

library(RRgeo)
library(terra)
library(sf)
library(ape)

# datG<-lapply(grep(".gpkg",list.files(),value=TRUE),st_read)
# names(datG)<-sapply(strsplit(grep(".gpkg",list.files(),value=TRUE),"_"),"[[",1)
# dat<-lapply(datG,function(x) x[,c("OBS","age","bio1", "bio4", "bio11", "bio19")])
yourdir<-"YOUR_DIRECTORY"
setwd(yourdir)

latesturl<-RRgeo:::get_latest_version("12734585")
load(url(paste0(latesturl,"/files/dat.Rda?download=1")))
read.tree(system.file("exdata/Eucopdata_tree.txt", package="RRgeo"))->tree
tree$tip.label<-gsub("_"," ",tree$tip.label)
head(dat[1])

As shown in the example, the standard format for input data should include the relevant explanatory variables, the geometry, and columns indicating species occurrence data in binary format. Additionally, if available, columns representing the time intervals associated with species presence and background points may be added.

3. Running ENphylo_modeling {#modeling}

Now, we can proceed with setting all ENphylo_modeling parameters. Other than the abovementioned input_data argument, the function requires a phylogenetic tree and an input_mask as mandatory arguments. The latter is the geographical mask defining the spatial domain encompassing the background area enclosing all the target species. Other possible arguments include:

Note: ENphylo_modeling is highly suitable for a multi-species modeling approach. However, it is strongly recommended not to exceed an abundant/rare species ratio of 0.7. This limit ensures accurate and reliable estimation of marginality and specialization values for rare species via imputation. If the 0.7 ratio is exceeded, ENphylo_modeling internally splits the original phylogeny (tree) into multiple phylogenies to ensure that the threshold is respected. In each subtree, the function automatically selects species that are phylogenetically close to the species to impute and preferentially splits the latter into different trees.

latesturl<-RRgeo:::get_latest_version("12734585")
curl::curl_download(paste0(latesturl,"/files/X35kya.tif?download=1"),
                    destfile = "X35kya.tif", quiet = FALSE)
rast("X35kya.tif")->map35
project(map35,st_crs(dat[[1]])$proj4string,res = 50000)->map

ENphylo_modeling(input_data=dat,
                 tree=tree,
                 input_mask=map[[1]],
                 obs_col="OBS",
                 time_col="age", 
                 min_occ_enfa=15, 
                 boot_test_perc=20,
                 boot_reps=10,
                 swap.args=list(nsim=10,si=0.2,si2=0.2),
                 eval.args=list(eval_metric_for_imputation="AUC",
                                eval_threshold=0.7,
                                output_options="best"),
                 clust=0.5, 
                 output.dir=yourdir)

ENphylo_modeling outputs {#output}

ENphylo_modeling creates two output folders within the output.dir path:

The content of model outputs object depends on the algorithm used for modeling individual species and on the combination of ENphylo_modeling arguments chosen by users. In any case, model outputs include the CO matrices (i.e., the correlation matrix of the environmental predictors) and the model performances.

Note: ENphylo_modeling returns all the evaluation metrics available for both ENFA and ENphylo. However, it internally relies on the selected eval_metric_for_imputation and output_options arguments to retrieve the best-fit "imputed" model. After testing nsim alternative phylogenies and calculating the performances of models generated from these phylogenies, ENphylo_modeling selects the model (or multiple models depending on the output_options argument) with the highest eval_metric_for_imputation value.

4. Running ENphylo_prediction {#prediction}

After running ENphylo_modeling, it is recommended to use the function ENphylo_prediction to make spatially explicit predictions into a new geographical area. Irrespective of whether ENFA or phylogenetic imputation was performed for modelling each species, ENphylo_prediction calculates new marginality and specialization estimates based on the specified geographical area. Additionally, the user can convert these predictions into habitat suitability maps.

To simplify the retrieval of ENFA/ENphylo models, we developed a new function, getENphylo_results, which automatically imports ENFA, ENphylo, or both models from the specified folder path. In addition, user can focus on single or multiple species by setting the species_name argument.

As a case study, we project two models, one ENFA model for Ursus maritimus and one imputed model for Vulpes velox, by using the map including bioclimatic variables at 35 kya. First, the map is set to retain the focal variables only (the ones used in ENphylo_modeling) and reprojected by using the Mollweide World projection (that is the same as input_data). Then, getENphylo_results retrieves the models for Ursus maritimus and Vulpes velox, and finally the user may set ENphylo_prediction to generate their habitat suitability maps in North America at 35 kya.

Note: ENphylo_prediction stores all the outputs in the output.dir folder by creating a new ENphylo_prediction folder. Inside the folder, a number of nested sub-folders, one per species, is created according to the proj_name argument of the function to store all the spatial predictions.

library(rnaturalearth)
ne_countries(returnclass = "sf")->globalmap
subset(globalmap,continent=="North America")->ame_map

map35[[c("bio1","bio4","bio11","bio19")]]->newmap
crop(newmap,ext(ame_map))->newmap
project(newmap,st_crs(dat[[1]])$proj4string,res = 50000)->newmap


getENphylo_results(input.dir =yourdir,
                   mods="all",
                   species_name=c("Vulpes velox","Ursus maritimus"))->mod

ENphylo_prediction(object = mod, 
                   newdata = newmap,
                   convert.to.suitability = TRUE,
                   output.dir=yourdir,
                   proj_name="proj_example")

In the following chunk we show how to plot habitat suitibility maps of the selected species by means of ggplot2. Such maps are provided as supporting material within the package. Yet, if the users wish to plot their own maps they can use the unevaluated lines to retrieve them and modify the fill argument in ggplot according to their output.

library(ggplot2)
library(ggtext)
library(viridis)

rast(system.file("exdata/Suit_Vulpes.tif", package="RRgeo"))->map1
rast(system.file("exdata/Suit_Ursus.tif", package="RRgeo"))->map2
as.data.frame(map1,xy=T)->map1
as.data.frame(map2,xy=T)->map2

# rast("./ENphylo_prediction/Vulpes velox/proj_example/Suitability.tif")->map1
# rast("./ENphylo_prediction/Ursus maritimus/proj_example/Suitability.tif")->map2
p1<-ggplot(map1,aes(x=x,y=y,fill=Suitability_swap_9))+
  geom_tile()+
  scale_fill_viridis(name = "Suitability")+
  labs(title="*Vulpes velox* at 35 kya")+
  theme(panel.background = element_rect(fill="aliceblue",colour = "black"),
        panel.grid = element_blank(),
        axis.text= element_text(size=10),
        axis.title = element_blank(),
        plot.title = element_markdown(size=12,hjust=0.5),
        plot.margin = unit(c(0.1,0.1,0.1,0.1),"cm"))

p2<-ggplot(map2,aes(x=x,y=y,fill=Suitability))+
  geom_tile()+
  scale_fill_viridis()+
  labs(title="*Ursus maritimus* at 35 kya")+
  theme(panel.background = element_rect(fill="aliceblue",colour = "black"),
        panel.grid = element_blank(),
        axis.text=element_text(size=10),
        axis.title = element_blank(),
        plot.title = element_markdown(size=12,hjust=0.5),
        plot.margin = unit(c(0.1,0.1,0.1,0.1),"cm"))

plot(p1)
plot(p2)

References



Try the RRgeo package in your browser

Any scripts or data that you put into this service are public.

RRgeo documentation built on July 2, 2025, 5:09 p.m.