acc_mahalanobis: Calculate and plot Mahalanobis distances for social science...

View source: R/acc_mahalanobis.R

acc_mahalanobisR Documentation

Calculate and plot Mahalanobis distances for social science indices

Description

A standard tool to calculate Mahalanobis distance. In this approach the Mahalanobis distance is calculated for ordinal variables (treated as continuous) to identify inattentive responses. It calculates the distance for each observational unit from the sample mean. The greater the distance, the atypical the responses.

Indicator

Usage

acc_mahalanobis(
  variable_group = NULL,
  label_col = VAR_NAMES,
  study_data,
  item_level = "item_level",
  meta_data = item_level,
  meta_data_v2,
  mahalanobis_threshold =
    suppressWarnings(as.numeric(getOption("dataquieR.MAHALANOBIS_THRESHOLD",
    dataquieR.MAHALANOBIS_THRESHOLD_default)))
)

Arguments

variable_group

variable list the names of the continuous measurement variables building a group, for that multivariate outliers make sense.

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

item_level

data.frame the data frame that contains metadata attributes of study data

meta_data

data.frame old name for item_level

meta_data_v2

character path to workbook like metadata file, see prep_load_workbook_like_file for details. ALL LOADED DATAFRAMES WILL BE PURGED, using prep_purge_data_frame_cache, if you specify meta_data_v2.

mahalanobis_threshold

numeric TODO: ES

Value

a list with:

  • SummaryTable: data.frame underlying the plot

  • SummaryPlot: ggplot2::ggplot2 outlier plot

  • FlaggedStudyData data.frame contains the original data frame with the additional columns tukey, ⁠3SD⁠, hubert, and sigmagap. Every observation is coded 0 if no outlier was detected in the respective column and 1 if an outlier was detected. This can be used to exclude observations with outliers.

ALGORITHM OF THIS IMPLEMENTATION:

  • Implementation is restricted to variables of type integer

  • Remove missing codes from the study data (if defined in the metadata)

  • The covariance matrix is estimated for all variables from variable_group

  • The Mahalanobis distance of each observation is calculated MD^2_i = (x_i - \mu)^T \Sigma^{-1} (x_i - \mu)

  • The default to consider a value an outlier is "use the 0.975 quantile of a chi-square distribution with p degrees of freedom" (Mayrhofer and Filzmoser, 2023) List function.

See Also

Online Documentation


dataquieR documentation built on Jan. 8, 2026, 5:08 p.m.