acc_multivariate_outlier: Function to calculate and plot Mahalanobis distances

View source: R/acc_multivariate_outlier.R

acc_multivariate_outlierR Documentation

Function to calculate and plot Mahalanobis distances


A standard tool to detect multivariate outliers is the Mahalanobis distance. This approach is very helpful for the interpretation of the plausibility of a measurement given the value of another. In this approach the Mahalanobis distance is used as a univariate measure itself. We apply the same rules for the identification of outliers as in univariate outliers:

  • the classical approach from Tukey: 1.5 * IQR from the 1st (Q_{25}) or 3rd (Q_{75}) quartile.

  • the 6* σ approach, i.e. any measurement of the Mahalanobis distance not in the interval of \bar{x} \pm 3*σ is considered an outlier.

  • the approach from Hubert for skewed distributions which is embedded in the R package robustbase

  • a completely heuristic approach named σ-gap.

For further details, please see the vignette for univariate outlier.


  id_vars = NULL,
  n_rules = 4,



variable list the name of the continuous measurement variables


variable optional, an ID variable of the study data. If not specified row numbers are used.


variable attribute the name of the column in the metadata with labels of variables


numeric from=1 to=4. the no. of rules that must be violated to classify as outlier


data.frame the data frame that contains the measurements


data.frame the data frame that contains metadata attributes of study data


a list with:

  • SummaryTable: data.frame underlying the plot

  • SummaryPlot: ggplot2 outlier plot

  • FlaggedStudyData data.frame contains the original data frame with the additional columns tukey, sixsigma, hubert, and sigmagap. Every observation is coded 0 if no outlier was detected in the respective column and 1 if an outlier was detected. This can be used to exclude observations with outliers.


  • Implementation is restricted to variables of type float

  • Remove missing codes from the study data (if defined in the metadata)

  • The covariance matrix is estimated for all resp_vars

  • The Mahalanobis distance of each observation is calculated MD^2_i = (x_i - μ)^T Σ^{-1} (x_i - μ)

  • The four rules mentioned above are applied on this distance for each observation in the study data

  • An output data frame is generated that flags each outlier

  • A parallel coordinate plot indicates respective outliers

List function.

See Also

Online Documentation

dataquieR documentation built on Aug. 31, 2022, 5:08 p.m.