View source: R/biodecrypt.cross.R
biodecrypt.cross | R Documentation |
The function biodecrypt.cross wraps the biodecrypt function to carry out cross-validation of known cases thus verifying the robustness of the attribution of unknown cases. This function requires the same input of biodecrypt (coordinates and vector with attribution together with values of distance ratio, buffer and alpha). Moreover this function requires a "runs" value defining the number of different runs, thus the fraction of test records included in each run. In each run, randomly selected group of test records (actually identified to a given species) are regarded as unidentified (0 value) and the biodecrypt function is carried out to attribute them. The analysis is repeated as often as defined in runs (a runs value of 10 will perform a ten-fold cross-validation based on the initial selection of ten randomly distributed subsets).
biodecrypt.cross(mat,id,alpha=NULL,ratio=2.5,buffer=90,fraction=0.95, partCount = 10,
checkdist=T, clipToCoast="terrestrial", proj = "+proj=longlat +datum=WGS84",minimum=7,
map=NULL,xlim=NULL,ylim=NULL,main=NULL,runs=10,test=T)
mat |
A matrix for longitude and latitude (in decimal degrees) for all records. |
id |
A vector indicating species membership of each record (in the same order of mat). Identified records are indicated with 1,2..n, unidentified records with 0. |
alpha |
A vector indicating an initial alpha value for each species. If NULL, the default value of 8 for all species is used. |
ratio |
The minimum ratio between the distance from the second distant hull compared to the closest hull to allow attribution. Default 2.5 times. |
buffer |
A distance buffer from hulls(in km). |
fraction |
The minimum fraction of occurrences that must be included in polygon. |
partCount |
The maximum number of disjunct polygons that are allowed.. |
clipToCoast |
Either "no" (no clipping), "terrestrial" (only terrestrial part of range is kept) or "aquatic" (only non-terrestrial part is clipped). |
checkdist |
Logical, if TRUE cases attributed to a given species based on relative distance from hulls but closer to an identified record of another species are not attributed. |
proj |
the projection information for mat. In this version, the default is the only supported option. |
minimum |
The minimum number of specimens required to build alpha hulls. If the number of identified specimens is lower, convex hulls are calculated to improve procedure stability. |
map |
A map to be plotted during the procedure to show the separation progress. |
xlim |
Longitude boudaries for the map. |
ylim |
latitude boudaries for the map. |
main |
The name to be plotted on the graph |
runs |
The number of runs among which the cases are randomly assigned as non-attributed records |
test |
A logical, if TRUE, a biodecrypt analysis is also carried out to compute NUR. |
The procedure attributes the subsets of identified records to the test group (unknown cases) as evenly as possible among runs both in terms of total number of test records and records belonging to the same original species. If the number of runs equates the number of records, then each identified record is individually attributed in a jackknife procedure. Subsequently, the attribution vector obtained is provided and compared with the original membership and two values are provided: the percentages of identified cases attributed to a wrong species (Mis-Identified Records, MIR) and the percentage of known cases not attributed to any species (Non-attributed Identified Records, NIR). The function also has an option to calculate the percentage of Non-attributed Unidentified Records (NUR) representing the fraction of unknown records that could not be attributed to a species after a typical biodecrypt analysis using the parameters provided by the user and the complete set of records.
type |
"cross" an argument to be passed to biodecrypt.plot. |
NUR |
The percentage of Non-attributed Unidentified Records. |
areas |
The hull areas for all the species (in km squares). |
intersections |
The areas of intersections among hulls for each pair of species. |
sympatry |
The fraction of the overlap area compared to the total area of the two hulls. |
table |
The result table of the test (if test=TRUE) with Longitude and Latitude for each occurrence datum, its id after the biodecrypt procedure (id2) and its initial id (id). |
cross |
The result table with the original attribution (original), the attribution obtained after cross validation (predicted) and the classification as MIR or NIR. Longitude and Latitude are also provided. |
MIR |
The percentage of Mis-Identified Records. |
NIR |
The percentage of Non-Identified Records. |
Leonardo Dapporto
Platania L. et al. Assigning occurrence data to cryptic taxa improves climatic niche assessments: biodecrypt, a new tool tested on European butterflies. Glocal Ecology and Biogeography (2020).
## Not Run
## Create an example for a dataset
#mat<-rbind(cbind(rnorm(n = 20, mean = 1, sd = 4),rnorm(n = 20, mean = 40, sd = 3)),
#cbind(rnorm(n = 20, mean = 7, sd = 5),rnorm(n = 20, mean = 45, sd = 2))
#id<-c(rep(1,20),rep(2,20))
#id[sample(c(1:40))[1:10]]<-0
#cross<-biodecrypt.cross(mat,id)
#plot(mat,type="n")
#biodecrypt.plot(cross)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.