Using Mahalanobis Distance and PCA for Quality Control

Share:

Description

Compute the Mahalanobis distance of each sample from the center of an N-dimensional principal component space.

Usage

1
mahalanobisQC(spca, N)

Arguments

spca

object of class SamplePCA representing the results of a principal components analysis.

N

integer scalar specifying the number of components to use when assessing QC.

Details

The theory says that, under the null hypothesis that all samples arise from the same multivariate normal distribution, the distance from the center of a D-dimensional principal component space should follow a chi-squared distribution with D degrees of freedom. This theory lets us compute p-values associated with the Mahalanobis distances for each sample. This method can be used for quality control or outlier identification.

Value

Returns a data frame containing two columns, with the rows corresponding to the columns of the original data set on which PCA was performed. First column is the chi-squared statistic, with N degrees of freedom. Second column is the associated p-value.

Author(s)

Kevin R. Coombes krc@silicovore.com

References

Coombes KR, et al.
Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clin Chem 2003; 49:1615-23.

Examples

1
2
3
4
5
library(oompaData)
data(lungData)
spca <- SamplePCA(na.omit(lung.dataset))
mc <- mahalanobisQC(spca, 2)
mc[mc$p.value < 0.01,]