acc_mahalanobis: Calculate and plot 'Mahalanobis' distances

View source: R/acc_mahalanobis.R

acc_mahalanobisR Documentation

Calculate and plot Mahalanobis distances

Description

A standard tool to calculate Mahalanobis distance. In this approach the squared Mahalanobis distance is calculated for ordinal variables (treated as continuous) to identify inattentive responses. It calculates the distance for each observational unit from the sample mean. The greater the distance, the atypical the responses.

Indicator

Usage

acc_mahalanobis(
  variable_group = NULL,
  study_data,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_cross_item = "cross-item_level",
  label_col = VAR_NAMES,
  meta_data_v2,
  cross_item_level,
  `cross-item_level`,
  mahalanobis_threshold =
    suppressWarnings(as.numeric(getOption("dataquieR.MAHALANOBIS_THRESHOLD",
    dataquieR.MAHALANOBIS_THRESHOLD_default)))
)

Arguments

variable_group

variable list the names of the variables used to calculate the Mahalanobis distance

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_cross_item

data.frame – Cross-item level metadata

label_col

variable attribute the name of the column in the metadata containing the labels of the variables

meta_data_v2

character path or file name of the workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2

cross_item_level

data.frame alias for meta_data_cross_item

`cross-item_level`

data.frame alias for meta_data_cross_item

mahalanobis_threshold

numeric the confidence level to use to define outliers, if not stated it is by default 0.975.

Value

a list with:

  • SummaryTable: data.frame underlying the plot

  • SummaryData: data.frame underlying the plot with speaking column labels

  • SummaryPlot: ggplot2::ggplot2 Q-Q plot of squared Mahalanobis distances vs. a theoretical chi-squared distribution showing outliers.

  • FlaggedStudyData: data.frame contains the original data frame of the variables used to calculate the squared Mahalanobis distances with the additional column, containing the squared Mahalanobis distance, and a column called MD_outliers, that contains 1 if the observational unit is considered a multivariate outlier.

ALGORITHM OF THIS IMPLEMENTATION:

  • Implementation is restricted to variables of type integer

  • Remove missing codes from the study data (if defined in the metadata)

  • The covariance matrix is estimated for all variables from variable_group

  • The Mahalanobis distance of each observation is calculated MD^2_i = (x_i - \mu)^T \Sigma^{-1} (x_i - \mu)

  • The default to consider a value an outlier is to use the 0.975 quantile of a theoretical chi-square distribution with degrees of freedom equals to the number of variables used to calculate the Mahalanobis distance (⁠Mayrhofer and Filzmoser⁠, 2023)

See Also

Online Documentation


dataquieR documentation built on May 12, 2026, 1:06 a.m.