ORIENTATION
Evaluating the accuracy of a predictive model usually requires testing it on independent data; however, as truly independent data are difficult to obtain for most biodiversity datasets, researchers often partition a single dataset into subsets. Occurrence data are partitioned into two categories: 1) those used to make the model (i.e. "training" or "calibration"), and 2) others that are withheld for evaluating it (i.e. "testing" or "evaluation" data) (Guisan and Zimmermann 2000; Peterson et al. 2011). One widely-used technique, k-fold cross-validation, involves partitioning the full dataset into k groups, then iteratively building a model using all groups but one and testing this model on the left-out group. This iterative process results in k models. Evaluation statistics can be averaged over these k models to help in model selection, leading to the production of a full model that includes all the data if desired.
There are myriad particular ways to partition data for niche/distributional models, but they can be split into two main categories that are offered in 1) Module: Non-spatial Partition and 2) Module: Spatial Partition.
REFERENCES
Guisan A., & Zimmermann N.E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling, 135(2-3), 147-186. DOI: 10.1016/S0304-3800(00)00354-9
Peterson, A.T., Soberón J., Pearson, R.G., Anderson, R.P., Martinez-Meyer, E., Nakamura M., & Araújo, M.B. (2011). Evaluating Model Performance and Significance. In: Ecological Niches and Geographic Distributions, Princeton, New Jersey: Monographs in Population Biology, 49. Princeton University Press. DOI: 10.23943/princeton/9780691136868.003.0009
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.