odis: Orthogonal distances from a PCA or PLS score space

View source: R/odis.R

odisR Documentation

Orthogonal distances from a PCA or PLS score space

Description

odis calculates orthogonal distances (OD = "X-residuals") from a PCA or PLS model, i.e. the Euclidean distances between the row observations of a data set and their projections on the score space (see e.g. Hubert et al. 2005, Van Branden & Hubert 2005, p. 66; Varmuza & Filzmoser, 2009, p. 79).

lodis does the same calculation for each local model (i.e. for each new observation to predict) generated by functions locw, lwplsr, etc.

Usage


odis(fm, Xr, Xu = NULL, 
    ncomp = NULL,
    robust = FALSE, alpha = .01)

lodis(fm, Xr, Xu, 
    ncomp = NULL,
    robust = FALSE, alpha = .01)

Arguments

fm

For odis, output of functions pca, pls or plsr. For lodis, output of functions locw, lwplsr, etc.

Xr

The matrix or data frame of reference (= training) observations that was used for building the preliminary model fm.

Xu

A m x p matrix or data frame of new (= test) observations (Xu is not used in the calculation of the median and MAD used for calculating the standardized distances; see Details).

ncomp

Number of components to consider for the distance calculations. If NULL (default), the maximum number of components is considered.

robust

Logical. If TRUE, the moment estimation of the cutoff (see Details) is robustified. This is recommended in particular after robust PCA or PLS on small data sets containing strong outliers. Default to FALSE.

alpha

Risk I level for defining the cutoff detecting extreme values (see the code).

Details

The cutoff for detecting extreme OD values is computed using a moment estimation of a Chi-squared distrbution for the squared distance (see Nomikos & MacGregor 1995, and Pomerantzev 2008).

Column dstand in the output is a "standardized" OD defined as OD / cutoff where the cutoff is calculated such as in Hubert et al. (2005 p.66) (). A value dstand > 1 may be considered as extreme.

Value

A list of outputs (see examples).

References

M. Hubert, P. J. Rousseeuw, K. Vanden Branden (2005). ROBPCA: a new approach to robust principal components analysis. Technometrics, 47, 64-79.

Nomikos, P., MacGregor, J.F., 1995. Multivariate SPC Charts for Monitoring Batch Processes. null 37, 41–59. https://doi.org/10.1080/00401706.1995.10485888

Pomerantsev, A.L., 2008. Acceptance areas for multivariate classification derived by projection methods. Journal of Chemometrics 22, 601–609. https://doi.org/10.1002/cem.1147

K. Vanden Branden, M. Hubert (2005). Robuts classification in high dimension based on the SIMCA method. Chem. Lab. Int. Syst, 79, 10-21.

K. Varmuza, P. Filzmoser (2009). Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca Raton.

Examples


n <- 8
p <- 6
set.seed(1)
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
y1 <- 100 * rnorm(n)
y2 <- 100 * rnorm(n)
Y <- cbind(y1, y2)
set.seed(NULL)

Xr <- X[1:6, ] ; Yr <- Y[1:6, ]
Xu <- X[7:8, ] ; Yu <- Y[7:8, ]

fm <- pls(Xr, Yr, ncomp = 3)
#fm <- plsr(Xr, Yr, Xu, ncomp = 3)

odis(fm, Xr)

odis(fm, Xr, Xu)


mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.