View source: R/accessory_geo.R
mahalanobisDist | R Documentation |
Calculate Squared Mahalanobis Distances
mahalanobisDist( lon, lat, method = NULL, n.min = 5, digs = 4, center = "mean", geo = NULL, cult = NULL, geo.patt = "ok_", cult.patt = NA )
lon |
numerical. Longitude in decimal degrees |
lat |
numerical. Latitude in decimal degrees |
method |
character. Type of method desired: 'classic' and/or 'robust' (see Details) |
n.min |
numerical. Minimum number of unique coordinates to be used in the calculations. |
digs |
numerical. Number of digits to be returned after the decimal point. Default to 4 |
center |
character. Which metric should be used to obtain he center of the distribution of coordinates: 'mean' or 'median'? |
geo |
character. A vector of the same length of lon/lat containing the result from the validation of the geographical coordinates. Default to NULL. |
cult |
character. A vector of the same length of lon/lat containing the result from the validation of cultivated specimens. Default to NULL. |
geo.patt |
character. The pattern to be used to search for classes of geographical validation to be included in the analyses. Default to "ok_". |
cult.patt |
character. The pattern to be used to search for classes of validation of cultivated specimens to be included in the analyses. Default to NA. |
Two possible methods to calculate the Mahalanobis distances are
available: the classic (method
= 'classic') and the robust methods
(method
= 'robust'). The two methods take into account the geographical
center of the coordinates distribution and the spatial covariance between
them. But they vary in the way the covariance matrix of the distribution is
defined: the classic method uses an approach based on Pearson's method,
while the robust method uses a Minimum Covariance Determinant (MCD)
estimator.
The argument n.min
controls the minimum number of unique coordinates
necessary to calculate the distances. The classic and robust methods needs
at least 3 and 4 spatially unique coordinates to obtain the distances. But
the MCD algorithm of the robust method can run into singularity issues
depending on how close the coordinates are. This issue can result in
the overestimation of the distances and thus in bad outlier flagging. A
minimum of five and ideally 10 unique coordinates should avoid those
problems.
If the MCD algorithm runs into singularity issues, the function silently add some random noise to both coordinates and re-run the MCD algorithm. This aims to deals with cases of few coordinates close to each other and in practice should not change the overall result of the detection of spatial outliers.
The presence of problematic coordinates and cultivated specimens can
greatly influence the estimation of the geographical center of the
coordinates distribution and the spatial covariance between them. Thus,
arguments geo
and cult
can be used to flag and remove those cases from
the computation of the center and covariance matrix. In both cases, the
user can provide the output from plantR functions checkCoord()
and
getCult()
or a logical TRUE/FALSE vector. By default, if the input are
the outputs from functions checkCoord()
and getCult()
, only the
coordinates flagged as 'ok_...' in geo
and those not flagged in cult
(i.e. NAs) will be used. But users can select different search patterns
using the arguments geo.patt
and cult.patt
. For both input options, the
vector must have the same length of the coordinates provided in the
arguments lat
and lon
. By default, arguments geo
and cult
are set
to NULL, meaning that all coordinates will be used.
The function internally removes spatially duplicated coordinates previous
to the calculation of the Mahalanobis distances. So, the value in n.min
correspond to the number of coordinates after the removal of spatially
duplicated coordinates.
The function also internally removes any empty or NA values in lon
or
lat
.
the input data frame and a new column(s) with the distances obtained using the selected method(s)
Renato A. Ferreira de Lima
uniqueCoord, checkCoord, getCult
lon <- c(-42.2,-42.6,-45.3,-42.5,-42.3,-39.0,-12.2) lat <- c(-44.6,-46.2,-45.4,-42.2,-43.7,-45.0,-8.0) ## Not run: # Assuming that all coordinates are valid mahalanobisDist(lon, lat, method = "classic", n.min = 1) mahalanobisDist(lon, lat, method = "robust", n.min = 1) # Flagging last coordinate as problematic mahalanobisDist(lon, lat, method = "classic", n.min = 1, geo = c(rep(TRUE, 6), FALSE)) mahalanobisDist(lon, lat, method = "robust", n.min = 1, geo = c(rep(TRUE, 6), FALSE)) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.