MoE_mahala | R Documentation |
Computes the Mahalanobis distance between the response variable(s) and the fitted values of linear regression models with multivariate or univariate responses.
MoE_mahala(fit,
resids,
squared = FALSE,
identity = NULL)
fit |
A fitted |
resids |
The residuals. Can be residuals for observations included in the model, or residuals arising from predictions on unseen data. Must be coercible to a matrix with the number of columns being the number of response variables. Missing values are not allowed. |
squared |
A logical. By default ( |
identity |
A logical indicating whether the identity matrix is used in place of the precision matrix in the Mahalanobis distance calculation. Defaults to |
A vector giving the Mahalanobis distance (or squared Mahalanobis distance) between response(s) and fitted values for each observation.
Keefe Murphy - <keefe.murphy@mu.ie>
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11634-019-00373-8")}>.
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the National Institute of Sciences, India, 2(1): 49-55.
data(ais)
hema <- as.matrix(ais[,3:7])
mod <- lm(hema ~ sex + BMI, data=ais)
res <- hema - predict(mod)
MoE_mahala(mod, res, squared=TRUE)
data(CO2data)
CO2 <- CO2data$CO2
GNP <- CO2data$GNP
mod2 <- lm(CO2 ~ GNP, data=CO2data)
pred <- predict(mod2)
res2 <- CO2 - pred
maha <- MoE_mahala(mod2, res2)
# Highlight outlying observations
plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
lines(GNP, pred, col="red")
points(GNP, CO2, cex=maha, lwd=2)
text(GNP, CO2, col="blue",
labels=replace(as.character(CO2data$country), maha < 1, ""))
# Replicate initialisation strategy using 2 randomly chosen components
# Repeat the random initialisation if necessary
# (until 'crit' at convergence is minimised)
G <- 3L
z <- sample(seq_len(G), nrow(CO2data), replace=TRUE)
old <- Inf
crit <- .Machine$double.xmax
while(crit < old) {
Sys.sleep(1)
old <- crit
maha <- NULL
plot(GNP, CO2, type="n", ylab=expression('CO'[2]))
for(g in seq_len(G)) {
ind <- which(z == g)
mod <- lm(CO2 ~ GNP, data=CO2data, sub=ind)
pred <- predict(mod, newdata=CO2data[,"CO2", drop=FALSE])
maha <- cbind(maha, MoE_mahala(mod, CO2 - pred))
lines(GNP, pred, col=g + 1L)
}
min.M <- rowMins(maha)
crit <- sum(min.M)
z <- max.col(maha == min.M)
points(GNP, CO2, cex=min.M, lwd=2, col=z + 1L)
text(GNP, CO2, col=z + 1L,
labels=replace(as.character(CO2data$country), which(min.M <= 1), ""))
}
crit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.