Description Usage Arguments Details Value See Also Examples
RaPKod is a kernel method for detecting outliers in a given dataset on the basis of a reference set of non-outliers. To do so, it 'transforms' a tested observation into some kernel space (through a 'feature map') and then projects it onto a random low-dimensional subspace of this kernel space. Since the distribution of this projection is known in the case of a non-outlier, it allows RaPKod to control the probability of false alarm error (ie labelling a non-outlier as an outlier).
1 2 3 |
X |
either a data frame or an n x d matrix (if given.kern=FALSE), otherwise an n x n kernel matrix (if given.kern=TRUE). In the former case, a Gaussian kernel is used by default. |
given.kern |
If FALSE (default), each row of X is an observation. Otherwise X is a kernel matrix (in this case, gamma and p must be user-specified). |
ref.n |
the size of the reference non-outlier dataset. Must be smaller than n. |
gamma |
the hyperparameter of the Gaussian kernel k(x, y) = exp( - gamma * || x - y ||^2). Set automatically by the program if not specified and given.kern=FALSE. |
p |
the number of dimensions of the projection made in the kernel space. Set automatically by the program if not specified and given.kern=FALSE. |
alpha |
the prescribed probability of false alarm error. |
use.tested.inlier |
If TRUE, each tested observation that is labelled as a non-outlier is appended to the reference dataset of non-outliers (the 'oldest' reference non-outlier is discarded). Set to FALSE by default. |
lowrank |
if lowrank="No" (default), the full kernel matrix is used. Otherwise, a low-rank approximation of the kernel matrix is computed: if "Nyst", it is approximated through Nystrom method; if "RKS", it is approximated by random Kitchen Sinks (in this case, X must be a dataset matrix, not a kernel matrix) |
r.lowrk |
if lowrank="Nyst" or "RKS", specifies the (low) rank of the approximated kernel matrix. |
K1 |
universal constant used in the heuristic formula of the optimal parameter gamma. |
K2 |
universal constant used in the heuristic formula of the optimal parameter p. |
If given.kern = FALSE, X is a dataset matrix whose first ref.n rows corresponds to the reference dataset of non-outliers. The (n - ref.n) other observations will be tested one by one by RaPKod to determine whether they are outliers or not.
If given.kern = TRUE, X must be a n x n Gram matrix. The kernel used to compute this Gram matrix should be of the form k(x, y) = K(gamma * || x - y ||^2) where K is a positive function. Also note that in this case, the parameters gamma and p must be specified by the user.
stats |
a vector of length (n - ref.n) containing the test statistics for each tested observation. |
flag |
a vector of length (n - ref.n) indicating which observations have been labelled as an outlier (TRUE in this case). |
pv |
a vector of length (n - ref.n) containing p-values for each tested observation. |
gamma |
the optimal value of gamma determined by the program (or the value provided by the user if it was user-specified). |
p |
the optimal value of p determined by the program (or the value provided by the user if it was user-specified). |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | data(iris)
##Define data frame with non-outliers
inliers = iris[sample(which(iris$Species!="setosa"), 100, replace=FALSE),
-which(names(iris)=="Species")]
##Define data frame with outliers
outliers = iris[which(iris$Species=="setosa"),-which(names(iris)=="Species")]
X = rbind(inliers, outliers)
ref.n = 50
result <- rapkod(X, ref.n = ref.n, use.tested.inlier = FALSE, alpha = 0.05)
##False alarm error ratio obtained on tested non-outliers (should be close to 0.05)
mean(result$pv[1:(nrow(inliers)-ref.n)]<0.05, na.rm = TRUE)
##Missed detection error ratio obtained on tested outliers (should be close to 0)
mean(result$pv[-(1:(nrow(inliers)-ref.n))]>0.05, na.rm = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.