checkOut | R Documentation |
This function searches for spatial outliers, i.e. records too
far away from species core distributions based on
Mahalanobis
distances. Spatial outliers can indicate misidentifications or records
obtained from cultivated individuals, although not all cultivated
individuals are necessarily spatial outliers (see also the plantR
function getCult()
).
checkOut( x, lon = "decimalLongitude.new", lat = "decimalLatitude.new", tax.name = "scientificName.new", geo.name = "geo.check", cult.name = "cult.check", n.min = 6, center = "median", geo.patt = "ok_", cult.patt = NA, clas.cut = 3, rob.cut = 16 )
x |
a data frame with the species records. |
lon |
character. Column with the record longitude in decimal degrees. Default to 'decimalLongitude.new'. |
lat |
character. Column with the record latitude in decimal degrees. Default to 'decimalLatitude.new' |
tax.name |
character. Name of the columns containing the species name. Default to "scientificName.new" |
geo.name |
character. Name of the column containing the validation of the geographical coordinates. Default to "geo.check" |
cult.name |
character. Name of the column containing the validation of records from cultivated individuals. Default to "cult.check" |
n.min |
numerical. Minimum number of unique coordinates to be used in the calculations. |
center |
character. Which metric should be used to obtain he center of the distribution of coordinates: 'mean' or 'median'? |
geo.patt |
character. The pattern to be used to search for classes of geographical validation to be included in the analyses. Default to "ok_". |
cult.patt |
character. The pattern to be used to search for classes of validation of cultivated specimens to be included in the analyses. Default to NA. |
clas.cut |
numerical. The threshold distance for outlier detection, using classic Mahalanobis distances. Default to 3 |
rob.cut |
numerical. The threshold distance for outlier detection, using classic Mahalanobis distances. Default to 16 |
The function searches for spatial outliers using two different
methods to detect outliers (Liu et al., 2018): the classic and the robust
squared Mahalanobis distances (see help of mahalanobisDist()
for
details). They can be used separately or combined (See Examples).
To detect outliers, we need thresholds to be applied to the values of
Mahalanobis distances obtained for each species (arguments clas.cut
and
rob.cut
). Ideally these thresholds should be species-specific, but this
is not always possible. Based on the empirical distribution of some
Atlantic Forest species with very different number of occurrences and
spatial distribution patterns, Lima et al. (2020) noted that occurrences
outside the species ranges often had classic and robust Mahalanobis
distances above 3 and 16 (used here as defaults). For cultivated species,
they used more restrictive thresholds of 2.5 and 12, respectively. They
also mentioned that these thresholds are very conservative (i.e. only more
extreme outliers are removed) and so some outliers may remain undetected.
The detection of outliers may depend on the amount of unique coordinates available. Therefore, the detection of spatial outliers is safer for cases where many unique coordinates are available. As a rule of thumb, ten unique coordinates per taxa should avoid possible problems (undetected true outliers or detection of false outliers). See Examples.
The input data frame with a new column containing the indication of spatial outliers.
Renato A. F. de Lima
Lima, R.A.F. et al. 2020. Defining endemism levels for biodiversity conservation: Tree species in the Atlantic Forest hotspot. Biological Conservation, 252: 108825.
Liu, C., White, M., and Newell, G. 2018. Detecting outliers in species distribution data. Journal of Biogeography, 45(1): 164-176.
checkCoord, getCult, mahalanobisDist
# few data and close coordinates (no outliers) lon <- c(-42.2,-42.3,-42.4,-42.3,-42.3) lat <- c(-44.3,-44.2,-44.2,-42.2,-42.2) df <- data.frame(lon = lon, lat = lat) checkOut(df, lon = "lon", lat = "lat", n.min = 4) checkOut(df, lon = "lon", lat = "lat", clas.cut = NULL, n.min = 4) # some data and one outlier lon <- c(runif(5, -45, -41), -12.2) lat <- c(runif(5, -45, -41), -18.2) df <- data.frame(lon = lon, lat = lat) checkOut(df, lon = "lon", lat = "lat") checkOut(df, lon = "lon", lat = "lat", clas.cut = NULL) checkOut(df, lon = "lon", lat = "lat", rob.cut = NULL) # more data and one outlier lon <- c(runif(9, -45, -41), -12.2) lat <- c(runif(9, -45, -41), -18.2) df <- data.frame(lon = lon, lat = lat) checkOut(df, lon = "lon", lat = "lat") checkOut(df, lon = "lon", lat = "lat", clas.cut = NULL) checkOut(df, lon = "lon", lat = "lat", rob.cut = NULL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.