MoE_mahala: Mahalanobis Distance Outlier Detection for Multivariate...
In Keefe-Murphy/MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

MoE_mahala

R Documentation

Mahalanobis Distance Outlier Detection for Multivariate Response

Description

Computes the Mahalanobis distance between the response variable(s) and the fitted values of linear regression models with multivariate or univariate responses.

Usage

MoE_mahala(fit,
           resids,
           squared = FALSE,
           identity = NULL)

Arguments

`fit`	A fitted `lm` model, inheriting either the `"mlm"` or `"lm"` class.
`resids`	The residuals. Can be residuals for observations included in the model, or residuals arising from predictions on unseen data. Must be coercible to a matrix with the number of columns being the number of response variables. Missing values are not allowed.
`squared`	A logical. By default (`FALSE`), the generalized interpoint distance is computed. Set this flag to `TRUE` for the squared value.
`identity`	A logical indicating whether the identity matrix is used in place of the precision matrix in the Mahalanobis distance calculation. Defaults to `FALSE` for multivariate response data but defaults to `TRUE` for univariate response data, where `TRUE` corresponds to the use of the Euclidean distance. Setting `identity=TRUE` with multivariate data may be advisable when the dimensions of the data are such that the covariance matrix cannot be inverted (otherwise, the pseudo-inverse is used under the `FALSE` default).

Value

A vector giving the Mahalanobis distance (or squared Mahalanobis distance) between response(s) and fitted values for each observation.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11634-019-00373-8")}>.

Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences, India, 2(1): 49-55.

Examples


data(ais)
hema <- as.matrix(ais[,3:7])
mod  <- lm(hema ~ sex + BMI, data=ais)
res  <- hema - predict(mod)
MoE_mahala(mod, res, squared=TRUE)

data(CO2data)
CO2  <- CO2data$CO2
GNP  <- CO2data$GNP
mod2 <- lm(CO2 ~ GNP, data=CO2data)
pred <- predict(mod2)
res2 <- CO2 - pred
maha <- MoE_mahala(mod2, res2)

# Highlight outlying observations
plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
lines(GNP, pred, col="red")
points(GNP, CO2, cex=maha, lwd=2)
text(GNP, CO2, col="blue", 
     labels=replace(as.character(CO2data$country), maha < 1, ""))
     
# Replicate initialisation strategy using 2 randomly chosen components
# Repeat the random initialisation if necessary
# (until 'crit' at convergence is minimised)
G       <- 3L
z       <- sample(seq_len(G), nrow(CO2data), replace=TRUE)
old     <- Inf
crit    <- .Machine$double.xmax
while(crit < old)   {
  Sys.sleep(1)
  old   <- crit
  maha  <- NULL
  plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
  for(g in seq_len(G)) { 
   ind  <- which(z == g)
   mod  <- lm(CO2 ~ GNP, data=CO2data, sub=ind)
   pred <- predict(mod, newdata=CO2data[,"CO2", drop=FALSE])
   maha <- cbind(maha, MoE_mahala(mod, CO2 - pred))
   lines(GNP, pred, col=g + 1L)
  }
  min.M <- rowMins(maha)
  crit  <- sum(min.M)
  z     <- max.col(maha == min.M)
  points(GNP, CO2, cex=min.M, lwd=2, col=z + 1L)
  text(GNP, CO2, col=z + 1L, 
       labels=replace(as.character(CO2data$country), which(min.M <= 1), ""))
}
crit

Keefe-Murphy/MoEClust documentation built on June 9, 2025, 9:47 p.m.

Keefe-Murphy/MoEClust index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Keefe-Murphy/MoEClust
Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

MoE_mahala: Mahalanobis Distance Outlier Detection for Multivariate...
In Keefe-Murphy/MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Mahalanobis Distance Outlier Detection for Multivariate Response

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to MoE_mahala in Keefe-Murphy/MoEClust...

R Package Documentation

Browse R Packages

We want your feedback!

Keefe-Murphy/MoEClust Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

MoE_mahala: Mahalanobis Distance Outlier Detection for Multivariate... In Keefe-Murphy/MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

Mahalanobis Distance Outlier Detection for Multivariate Response

Description

Usage

Arguments

Value

Author(s)

References

Examples

Related to MoE_mahala in Keefe-Murphy/MoEClust...

R Package Documentation

Browse R Packages

We want your feedback!

Keefe-Murphy/MoEClust
Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

MoE_mahala: Mahalanobis Distance Outlier Detection for Multivariate...
In Keefe-Murphy/MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component