Description Usage Arguments Details Value Author(s) References See Also Examples
The function implements a computationally fast procedure for identifying outliers that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for highdimensional data. This approach requires considerably less computational time than existing methods for outlier detection, and is suitable for use on very large data sets. It is also capable of analyzing the data situation commonly found in certain biological applications in which the number of dimensions is several orders of magnitude larger than the number of observations.
1 2 3 4 5  OutlierPCOut(x, ...)
## Default S3 method:
OutlierPCOut(x, grouping, explvar=0.99, trace=FALSE, ...)
## S3 method for class 'formula'
OutlierPCOut(formula, data, ..., subset, na.action)

formula 
a formula with no response variable, referring only to numeric variables. 
data 
an optional data frame (or similar: see

subset 
an optional vector used to select rows (observations) of the
data matrix 
na.action 
a function which indicates what should happen
when the data contain 
... 
arguments passed to or from other methods. 
x 
a matrix or data frame. 
grouping 
grouping variable: a factor specifying the class for each observation. 
explvar 
a numeric value between 0 and 1 indicating how much variance should be covered by the robust PCs (default to 0.99) 
trace 
whether to print intermediate results. Default is 
If the data set consists of two or more classes
(specified by the grouping variable grouping
) the proposed method iterates
through the classes present in the data, separates each class from the rest and
identifies the outliers relative to this class, thus treating both types of outliers,
the mislabeled and the abnormal samples in a homogenous way.
An S4 object of class OutlierPCOut
which
is a subclass of the virtual class Outlier
.
Valentin Todorov [email protected]
P. Filzmoser, R. Maronna and M. Werner (2008), Outlier identification in high dimensions, Computational Statistics & Data Analysis, Vol. 52 1694–1711.
P. Filzmoser & V. Todorov (2012), Robust tools for the imperfect world, To appear.
1 2 3 4 5 6 7 8 9  data(hemophilia)
obj < OutlierPCOut(gr~.,data=hemophilia)
obj
getDistance(obj) # returns an array of distances
getClassLabels(obj, 1) # returns an array of indices for a given class
getCutoff(obj) # returns an array of cutoff values (for each class, usually equal)
getFlag(obj) # returns an 0/1 array of flags
plot(obj, class=2) # standard plot function

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.