ENphylo_modeling | R Documentation |
The function computes vectors of marginality and specialization
according to Rinnan & Lawler (2019) via Environmental Niche Factor
Analysis (ENFA) and phylogenetic imputation (Garland & Ives, 2000).
It takes a list of Simple Features
(or sf) objects and a
phylogenetic tree to train ENFA and/or ENphylo models. Both model techniques
are calibrated and evaluated while accounting for phylogenetic uncertainty.
Calibrations are made on a random subset of the data under the bootstrap
cross-validation scheme. The predictive power of the different models is
estimated using five different evaluation metrics.
ENphylo_modeling(input_data, tree, input_mask, obs_col, time_col=NULL,
min_occ_enfa=30, boot_test_perc=20, boot_reps=10, swap.args= list(nsim=10,
si=0.2, si2=0.2), eval.args=list(eval_metric_for_imputation="AUC",
eval_threshold=0.7,output_options="best"),clust=0.5,output.dir)
input_data |
a list of |
tree |
an object of class |
input_mask |
a |
obs_col |
character. Name of the |
time_col |
character. Name of the |
min_occ_enfa |
numeric. The minimum number of occurrence data required for a species to be modeled with ENFA. |
boot_test_perc |
numeric. Percentage of data (ranging between 0 and 100)
used to calibrate ENFA and/or ENphylo models within a bootstrap
cross-validation scheme. The remaining percentage
( |
boot_reps |
numeric. Number of evaluation runs performed within the
bootstrap cross-validation scheme to evaluate ENFA and/or ENphylo models. If
set to 0, models evaluation is skipped and the internal evaluation element
returns |
swap.args |
list of ENphylo parameters. It includes:
|
eval.args |
list of evaluation model parameters. It includes:
|
clust |
numeric. The proportion of cores used to train ENFA and ENphylo
models. If |
output.dir |
the file path wherein |
ENphylo_modeling
automatically arranges input_data
in a
suitable format to run ENFA or ENphylo. The internal call of the function is
"calibrated_enfa"
for ENFA and "calibrated_imputed"
for
ENphylo, respectively.
Phylogenetic uncertainty
The function does not work with nsim
< 1 since one of the strongest
points of ENphylo_modeling
is to test alternative phylogenies to
provide the most accurate reconstruction of species environmental
preferences. Similarly, setting nsim = 1
limits the power of the
function, as it will use the original tree without generating alternative
phylogenies.
Phylogenetic Imputation
ENphylo_modeling
automatically switches from ENFA to ENphylo
algorithm for any species having less than min_occ_enfa
occurrences
or ENFA model accuracy below eval_threshold
. In this latter case, the
function performs both models and retains the one performing best according
to eval_metric_for_imputation
. Phylogenetic imputation is allowed for
up to 30% of the species on the tree. If the number of species to impute
exceeds 30%, ENphylo_modeling
automatically splits the original tree
into smaller subtrees, so that the maximum percentage of imputation is
observed. Each subtree is designed to impute phylogenetically distant
species and to retain species phylogenetically close to the taxa to be
imputed (so that imputation is robust). In this case, the function prints
the number of phylogenies used.
Outputs
If ENphylo_modeling
runs the ENphylo algorithm, the outputs depend on
the strategy adopted by the user through the output_options
argument.
If output_options="full"
, all CO matrices and evaluation metrics for
all the swapped trees tested are returned. Under
output_options="weighted.mean"
, the output consists of a subset of CO
matrices and evaluation metrics for those tree swapping iterations achieving
a predictive accuracy in terms of eval_metric_for_imputation
above
eval_threshold
. Finally, if output_options="best"
, a single CO
matrix and evaluation scores list corresponding to the most accurate swapped
tree is returned. If any tree swapping iterations under either "best"
or "weighted.mean"
results in accuracy below the threshold, the
function automatically switches to "full"
strategy.
Eventually, the function creates two new folders, "ENphylo_enfa_models" and
"ENphylo_imputed_models", in output.dir
. In each of these folders, a
number of new named subfolders equal to the number of modeled species are
created. Therein, model outputs and background area are saved as
model_outputs.RData
and study_area.tif
, respectively.
model_outputs.RData
includes a list of three elements, regardless of
whether ENFA or ENphylo is used:
$call a character specifying the algorithm used to model the species (i.e. ENFA or ENphylo).
$formatted data a list of
input data formatted to run either ENFA or ENphylo algorithms. Specifically,
the list reports: the presence data points ($input_ones
),
the background points ($input_back
),the name
of the columns associated to the arguments OBS_col
and
time_col
(if specified), the name of the column containing the cell
numbers (geoID_col
), and the coordinates of presence data only
($one_coords
).
$calibrated_model a list. The output objects are different depending on whether ENFA or ENphylo is used to model the species:
ENFA
$call: a character specifying the algorithm used.
$full_ model: a list containing marginality and specialization factors, the 'co' matrix, the number of significant axes, and all the other objects generated by applying ENFA on the entire occurrence dataset (see Rinnan et al. 2019 for additional details).
$evaluation: a matrix containing the evaluation scores of the ENFA model assessed by all possible evaluation metrics (i.e. Area Under the Curve (AUC), True Skill Statistic (TSS), Boyce Index (CBI), Sorensen Index, and Omission Rate (OMR)) for each model evaluations run.
ENphylo
$call: a character specifying the algorithm used.
$co: a list of the 'co' matrices of length equal to the number of
alternative phylogenies tested (i.e. nsim
argument). The number of
'co' matrices also reflects the selected output_option strategy.
$evaluation: a data.frame containing the evaluation scores of ENphylo
model assessed by all possible evaluation metrics for each alternative
phylogeny. The output of this object depends on the strategy adopted by the
user through the output_options
argument.Specifically, the function
internally selects the model (or models) with the highest evaluation score
according to the specified evaluation metric.
$output_options: a
character vector including the argument output_options
and
eval_metric_for_imputation
set to run the of ENphylo model.
The function does not return the output into .GlobalEnv
. Use
the function getENphylo_results
to collect results from local
folders.
Alessandro Mondanaro, Mirko Di Febbraro, Silvia Castiglione, Carmela Serio, Marina Melchionna, Pasquale Raia
Rinnan, D. S., & Lawler, J. (2019). Climate-niche factor analysis: a spatial approach to quantifying species vulnerability to climate change. Ecography, 42(9), 1494–1503. doi/full/10.1111/ecog.03937
Garland, T., & Ives, A. R. (2000). Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. American Naturalist, 155(3),346–364. doi.org/10.1086/303327
Mondanaro, A., Di Febbraro, M., Castiglione, S., Melchionna, M., Serio, C., Girardi, G., Blefiore, A.M., & Raia, P. (2023). ENphylo: A new method to model the distribution of extremely rare species. Methods in Ecology and Evolution, 14: 911-922. doi:10.1111/2041-210X.14066
getENphylo_results; ENphylo
vignette
library(ape)
library(terra)
library(sf)
library(RRgeo)
newwd<-tempdir()
# newwd<-"YOUR_DIRECTORY"
latesturl<-RRgeo:::get_latest_version("12734585")
curl::curl_download(url = paste0(latesturl,"/files/dat.Rda?download=1"),
destfile = file.path(newwd,"dat.Rda"), quiet = FALSE)
load(file.path(newwd,"dat.Rda"))
read.tree(system.file("exdata/Eucopdata_tree.txt", package="RRgeo"))->tree
tree$tip.label<-gsub("_"," ",tree$tip.label)
curl::curl_download(paste0(latesturl,"/files/X35kya.tif?download=1"),
destfile = file.path(newwd,"X35kya.tif"), quiet = FALSE)
rast(file.path(newwd,"X35kya.tif"))->map35
project(map35,st_crs(dat[[1]])$proj4string,res = 50000)->map
ENphylo_modeling(input_data=dat[c(1,11)],
tree=tree,
input_mask=map[[1]],
obs_col="OBS",
time_col="age",
min_occ_enfa=15,
boot_test_perc=20,
boot_reps=10,
swap.args=list(nsim=5,si=0.2,si2=0.2),
eval.args=list(eval_metric_for_imputation="AUC",
eval_threshold=0.7,
output_options="best"),
clust=NULL,
output.dir=newwd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.