OutlierPCOut | R Documentation |
The function implements a computationally fast procedure for identifying outliers that is particularly effective in high dimensions. This algorithm utilizes simple properties of principal components to identify outliers in the transformed space, leading to significant computational advantages for high-dimensional data. This approach requires considerably less computational time than existing methods for outlier detection, and is suitable for use on very large data sets. It is also capable of analyzing the data situation commonly found in certain biological applications in which the number of dimensions is several orders of magnitude larger than the number of observations.
OutlierPCOut(x, ...)
## Default S3 method:
OutlierPCOut(x, grouping, explvar=0.99, trace=FALSE, ...)
## S3 method for class 'formula'
OutlierPCOut(formula, data, ..., subset, na.action)
formula |
a formula with no response variable, referring only to numeric variables. |
data |
an optional data frame (or similar: see
|
subset |
an optional vector used to select rows (observations) of the
data matrix |
na.action |
a function which indicates what should happen
when the data contain |
... |
arguments passed to or from other methods. |
x |
a matrix or data frame. |
grouping |
grouping variable: a factor specifying the class for each observation. |
explvar |
a numeric value between 0 and 1 indicating how much variance should be covered by the robust PCs (default to 0.99) |
trace |
whether to print intermediate results. Default is |
If the data set consists of two or more classes
(specified by the grouping variable grouping
) the proposed method iterates
through the classes present in the data, separates each class from the rest and
identifies the outliers relative to this class, thus treating both types of outliers,
the mislabeled and the abnormal samples in a homogenous way.
An S4 object of class OutlierPCOut
which
is a subclass of the virtual class Outlier
.
Valentin Todorov valentin.todorov@chello.at
P. Filzmoser, R. Maronna and M. Werner (2008). Outlier identification in high dimensions, Computational Statistics & Data Analysis, Vol. 52 1694–1711.
Filzmoser P & Todorov V (2013). Robust tools for the imperfect world, Information Sciences 245, 4–20. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.ins.2012.10.017")}.
OutlierPCOut
, Outlier
data(hemophilia)
obj <- OutlierPCOut(gr~.,data=hemophilia)
obj
getDistance(obj) # returns an array of distances
getClassLabels(obj, 1) # returns an array of indices for a given class
getCutoff(obj) # returns an array of cutoff values (for each class, usually equal)
getFlag(obj) # returns an 0/1 array of flags
plot(obj, class=2) # standard plot function
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.