Computes mahalanobis distance for each row of data frame

Share:

Description

This function will return a vector, with the same length as the number of rows of the provided data frame, corresponding to the average mahalanobis distances of each row from the whole data set.

Usage

1
maha_dist(data, keep.NA = TRUE, robust = FALSE)

Arguments

data

A data frame

keep.NA

Ensure that every row with missing data remains NA in the output? TRUE by default.

robust

Attempt to compute mahalanobis distance based on robust covariance matrix? FALSE by default

Details

This is useful for finding anomalous observations, row-wise.

It will convert any categorical variables in the data frame into numerics as long as they are factors. For example, in order for a character column to be used as a component in the distance calculations, it must either be a factor, or converted to a factor.

Value

A vector of observation-wise mahalanobis distances.

See Also

insist_rows

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
maha_dist(mtcars)

maha_dist(iris, robust=TRUE)


library(magrittr)            # for piping operator
library(dplyr)               # for "everything()" function

# using every column from mtcars, compute mahalanobis distance
# for each observation, and ensure that each distance is within 10
# median absolute deviations from the median
mtcars %>%
  insist_rows(maha_dist, within_n_mads(10), everything())
  ## anything here will run