Many approaches and algorithms exist for building models of species niches/distributions, as well as for evaluating them (Guisan and Thuiller, 2005; Elith et al. 2006; Franklin 2010 chaps. 6, 7, 8). Component: Build and Evaluate Niche Model uses the output from earlier components to build models using either presence-only or presence-background data (and associated environmental information). Wallace currently allows users to build models using either: 1) the presence-only approach BIOCLIM (Module: BIOCLIM), or 2) the presence-background algorithm Maxent (Module: Maxent).
As noted, various evaluation metrics exist for assessing niche/distributional model performance, and many are based on the ability to predict localities that are withheld for evaluation (Peterson et al. 2011). As in Zurell et al. 2020 and Kass et al. 2021, Wallace now follows the terminology of Hastie et al. (2009), which distinguishes among:
Wallace currently implements extensive evaluations based on splits into training and validation data.
In addition to a single split, some more-complicated data-partitioning schemes involve iterative splits where a different subset of localities is withheld in turn (e.g., k-fold cross-validation). As part of Component: Build and Evaluate Niche Model, Wallace provides a table of a few commonly used evaluation metrics, including Area Under the Curve (AUC), omission rate, Continuous Boyce Index (CBI), and Akaike Information Criterion (AIC; which does not use withheld localities at all). There is no single “best” way to evaluate niche/distributional models (especially without absence data), so Wallace seeks to provide a number of them and let users decide which ones fit their research purposes.
AUC stands for the Area Under the Curve (AUC) of a Receiver Operating Characteristic (ROC) plot. AUC is a non-parametric measure of the ability of a classifier (here, the model) to rank positive records higher than negative ones across the full range of suitability values of the model; thus, it judges the model's discriminative ability. AUC ranges from 0 to 1, but major complications for interpreting AUC exist when true negative data (e.g., absences) do not exist (Lobo et al. 2008; Peterson et al. 2008; Peterson et al. 2011 chap. 9). As the niche/distributional models offered in Wallace are presence-only or presence-background, AUC values should only be considered as relative indicators of performance (e.g., between different settings of the same algorithm, for the same dataset of a given species). Wallace uses ENMeval 2.0 (Kass et al. 2021) to: 1) calculate AUC using the model made with all occurrence localities and “evaluate” it with the same records, i.e. the training localities (training AUC), 2) calculate AUC for each iteration of k-fold cross validation separately, using each model built with the training localities and evaluating based on the validation localities for that iteration (validation AUC, the average across the k iterations), and 3) take the difference between the training and validation AUC for each iteration of k-fold cross validation (AUC diff, again averaged over the k iterations). Validation AUC constitutes the way that AUC usually is applied in the field and is important to consider before performing model transfers to other areas/times (Roberts et al. 2017). Higher AUC difference should indicate greater model overfitting, as overfit models should perform better on training than testing data (Warren and Seifert 2011) and hence be avoided. If a set of occurrence records are completely withheld (never used for training), then true testing AUC can be calculated (not currently implemented in Wallace).
Five fields in the “Results” table pertain to AUC: auc.train: AUC calculated using all occurrence localities auc.val.avg and auc.val.sd: mean and standard deviation of the k validation AUCs (one for each partition) * auc.diff.avg and auc.diff.sd: mean and standard deviation of all differences between the k training and validation AUCs
The omission rate (OR) is a method for evaluating the ability of a binary classifier (here, the model) to predict localities (usually withheld ones), typically after applying a threshold to a continuous or ordinal model prediction (see Module Map Prediction). Application of the threshold makes the prediction binary (e.g., 0s and 1s), and the omission rate then equals the proportion of withheld localities that fall in grid cells with a 0 (i.e. below the threshold; Peterson et al. 2011). An OR of 0 indicates that no localities fall outside the prediction, whereas an OR of 1 indicates that all of them do. There are many possible thresholding rules, but Wallace offers two for evaluation that are commonly used: minimum training presence (MTP) and 10 percentile training presence (10pct). MTP is the lowest suitability score for any occurrence localities used to train the model. 10pct is the lowest suitability score for such localities after excluding the lowest 10% of them. Thus, 10pct is stricter than MTP. As with AUC, Wallace calculates OR for each partition; it does so by applying a threshold to the continuous prediction and finding the proportion of validation localities that fall outside the resulting binary prediction. Four fields in the “Results” table pertain to omission rate: or.mtp.avg and or.mtp.sd: mean and standard deviation of all test MTP omission rates or.10p.avg and or.10p.sd: mean and standard deviation of all test 10pct omission rates
Continuous Boyce Index (CBI), evaluates models based on dividing the suitability range into ‘b’ classes, as opposed to only two. For each class, it calculates the ratio of the frequency of evaluation localities predicted by the model over their expected frequency (the latter based on the proportion of the study region corresponding to that class). An ideal model will have an increasing curve; low suitability classes should have fewer evaluation presences than a random model, and the ratio of predicted/expected should increase with higher suitability (Boyce et al. 2002; Hirzel et al. 2006). CBI values range from -1 to 1, with positive values denoting a model having predictions consistent with the evaluation data and negative ones showing a poor model(Hirzel et al. 2006). * cbi.val.avg and cbi.val.sd: mean and standard deviation of the test CBIs
AIC, or the Akaike Information Criterion, is a model evaluation metric used often with regression-based techniques. Models with the lowest AIC are identified as optimal among candidate models. It is calculated in this way (Burnham and Anderson 2002): (AIC = 2k - 2ln(L)) Here, k = number of model parameters, and ln(L) is the log likelihood of the model (note that here k has a different meaning than used elsewhere in Wallace for k-fold partitioning). Models with more parameters will have a greater positive number for the first term, and those with higher likelihoods will have a greater negative number for the second term. Therefore, simpler models with fewer parameters and models with high likelihoods will both receive lower AIC scores. As an example, if two models have approximately the same likelihood, the one with fewer parameters will have a lower AIC. In this way, in general AIC penalizes complex models, but it does reward complex ones with high likelihoods, with the intent of finding the best balance between complexity and fit (Burnham and Anderson 2002). For Maxent, Wallace calculates AICc (corrected for finite sample sizes) using the methodology outlined in Warren and Seifert (2011); it does not calculate AIC for BIOCLIM. AIC is calculated on the full model only, and thus does not consider partitions. Four fields in the “Results” table pertain to AIC: ncoef: number of parameters (coefficients) in the model (for Maxent, this includes features of variables as well (e.g. four hinges of bio1 means 4 parameters, a quadratic term for bio11 means 2 parameters, etc.) AICc delta.AICc: absolute difference between the lowest AICc and each AICc (e.g. delta.AICc = 0 is the model with the lowest AICc) w.AIC: the AIC weight, calculated as the average relative model likelihood (exp(-0.5 * delta.AICc)) across all models, can be used in model averaging (Burnham and Anderson 2002)
Evaluation statistics are shown in the ‘Results’ tab, for the full model, the average of the partitions, and the individual partitions. If run in sequence, the current model results will overwrite the previous ones. We envision that this component will grow substantially in future releases, with the addition of new modules implementing other modeling techniques.
REFERENCES
Boyce, M.S., Vernier, P.R., Nielsen, S.E., & Schmiegelow, F.K.A. (2002). Evaluating resource selection functions. Ecological Modelling, 157(2-3), 281–300. DOI: 10.1016/S0304-3800(02)00200-4
Burnham, K.P., & Anderson, D.R. (2002). Model selection and multimodel inference : a practical information-theoretic approach. Springer, New York. DOI: 10.1007/b97636
Elith, J., Graham, C.H., Anderson, R.P., Dudík, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Leahmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson, A.T., Phillips, S.J., Richardson, K.S., Scachetti-Pereira, R., Schapire, R.E., Soberón, J., Williams, S., Wisz, M.S., & Zimmermann, N.E. (2006). Novel methods improve prediction of species' distributions from occurrence data. Ecography, 29(2), 129-151. DOI: 10.1111/j.2006.0906-7590.04596.x
Franklin, J. (2010). Mapping Species Distributions: Spatial Inference and Prediction. Statistical models - modern regression; Machine learning methods; Classification, similarity and other methods for presence-only data. Cambridge: Cambridge University Press. DOI: 10.1017/CBO9780511810602
Guisan, A., & Thuiller, W. (2005). Predicting species distribution: offering more than simple habitat models. Ecology Letters, 8, 993-1009. DOI: 10.1111/j.1461-0248.2005.00792.x
Hastie, T., Tibshirani, R., & Friedman, J.H. (2009). The elements of Statistical Learning: Data Mining, Inference, and prediction. Springer. The Elements of Statistical Learning
Hirzel, A.H., Lay, G.L., Helfer, H., Randin, C., & Guisan, A. (2006) Evaluating the ability of habitat suitability models to predict species presences. Ecological Modelling, 199(2), 142–152. DOI: 10.1016/j.ecolmodel.2006.05.017
Kass, J., Muscarella, R., Galante, P.J., Bohl, C.L., Pinilla-Buitrago, G.E., Boria, R.A., Soley-Guardia, M., & Anderson, R.P. (2021). ENMeval 2.0: Redesigned for customizable and reproducible modeling of species’ niches and distributions. Methods in Ecology and Evolution, 12(9), 1602-1608. DOI: 10.1111/2041-210X.13628
Lobo, J.M., Jiménez-Valverde, A., & Real, R. (2008). AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17(2), 145-151. DOI: 10.1111/j.1466-8238.2007.00358.x
Peterson, A.T., Papeş, M., & Soberón, J. (2008). Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecological Modelling, 213(1), 63-72. DOI: 10.1016/j.ecolmodel.2007.11.008
Peterson, A.T., Soberón, J., Pearson, R.G., Anderson, R.P., Martinez-Meyer, E., Nakamura, M., & Araújo, M.B. (2011). Evaluating Model Performance and Significance. In: Ecological Niches and Geographic Distributions. Princeton, New Jersey: Monographs in Population Biology, 49. Princeton University Press. DOI: 10.23943/princeton/9780691136868.003.0005
Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J.J., Schröder, B., Thuiller, W., Warton, D.I., Wintle, B.A., Hartig, F., & Dormann, C.F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913-929. DOI: 10.1111/ecog.02881
Warren, D.L., & Seifert, S.N. (2011). Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21(2), 335-342. DOI: 10.1890/10-1171.1
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.