Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/OutlierPCDist.R
The function implements a simple, automatic outlier detection method suitable for high dimensional data that treats each class independently and uses a statistically principled threshold for outliers. The algorithm can detect both mislabeled and abnormal samples without reference to other classes.
1 2 3 4 5  OutlierPCDist(x, ...)
## Default S3 method:
OutlierPCDist(x, grouping, control, k, explvar, trace=FALSE, ...)
## S3 method for class 'formula'
OutlierPCDist(formula, data, ..., subset, na.action)

formula 
a formula with no response variable, referring only to numeric variables. 
data 
an optional data frame (or similar: see

subset 
an optional vector used to select rows (observations) of the
data matrix 
na.action 
a function which indicates what should happen
when the data contain 
... 
arguments passed to or from other methods. 
x 
a matrix or data frame. 
grouping 
grouping variable: a factor specifying the class for each observation. 
control 
a control object (S4) for one of the available control classes,
e.g. 
k 
Number of components to select for PCA. If missing, the number of components will be calculated automatically 
explvar 
Minimal explained variance to be used for calculation of
the number of components in PCA. If 
trace 
whether to print intermediate results. Default is 
If the data set consists of two or more classes
(specified by the grouping variable grouping
) the proposed method iterates
through the classes present in the data, separates each class from the rest and
identifies the outliers relative to this class, thus treating both types of outliers,
the mislabeled and the abnormal samples in a homogenous way.
The first step of the algorithm is dimensionality reduction using (classical) PCA. The number of components to select can be provided by the user but if missing, the number of components will be calculated either using the provided minimal explained variance or by the automatic dimensionality selection using profile likelihood, as proposed by Zhu and Ghodsi.
An S4 object of class OutlierPCDist
which
is a subclass of the virtual class Outlier
.
Valentin Todorov [email protected]
A.D. Shieh and Y.S. Hung (2009), Detecting Outlier Samples in Microarray Data, Statistical Applications in Genetics and Molecular Biology Vol. 8.
M. Zhu, and A. Ghodsi (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, Vol. 51, 918930.
P. Filzmoser & V. Todorov (2012), Robust tools for the imperfect world, To appear.
1 2 3 4 5 6 7 8 9  data(hemophilia)
obj < OutlierPCDist(gr~.,data=hemophilia)
obj
getDistance(obj) # returns an array of distances
getClassLabels(obj, 1) # returns an array of indices for a given class
getCutoff(obj) # returns an array of cutoff values (for each class, usually equal)
getFlag(obj) # returns an 0/1 array of flags
plot(obj, class=2) # standard plot function

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.