# rapkod: RaPKod: Random Projections Kernel Outlier Detection In RaPKod: Random Projection Kernel Outlier Detector

## Description

RaPKod is a kernel method for detecting outliers in a given dataset on the basis of a reference set of non-outliers. To do so, it 'transforms' a tested observation into some kernel space (through a 'feature map') and then projects it onto a random low-dimensional subspace of this kernel space. Since the distribution of this projection is known in the case of a non-outlier, it allows RaPKod to control the probability of false alarm error (ie labelling a non-outlier as an outlier).

## Usage

 ```1 2 3``` ``` rapkod(X, given.kern = FALSE, ref.n=NULL, gamma=NULL, p=NULL, alpha = 0.05, use.tested.inlier = FALSE, lowrank = "No", r.lowrk = ceiling(sqrt(nrow(X))), K1 = 6, K2 = 50) ```

## Arguments

 `X` either a data frame or an n x d matrix (if given.kern=FALSE), otherwise an n x n kernel matrix (if given.kern=TRUE). In the former case, a Gaussian kernel is used by default. `given.kern` If FALSE (default), each row of X is an observation. Otherwise X is a kernel matrix (in this case, gamma and p must be user-specified). `ref.n` the size of the reference non-outlier dataset. Must be smaller than n. `gamma` the hyperparameter of the Gaussian kernel k(x, y) = exp( - gamma * || x - y ||^2). Set automatically by the program if not specified and given.kern=FALSE. `p` the number of dimensions of the projection made in the kernel space. Set automatically by the program if not specified and given.kern=FALSE. `alpha` the prescribed probability of false alarm error. `use.tested.inlier` If TRUE, each tested observation that is labelled as a non-outlier is appended to the reference dataset of non-outliers (the 'oldest' reference non-outlier is discarded). Set to FALSE by default. `lowrank` if lowrank="No" (default), the full kernel matrix is used. Otherwise, a low-rank approximation of the kernel matrix is computed: if "Nyst", it is approximated through Nystrom method; if "RKS", it is approximated by random Kitchen Sinks (in this case, X must be a dataset matrix, not a kernel matrix) `r.lowrk` if lowrank="Nyst" or "RKS", specifies the (low) rank of the approximated kernel matrix. `K1` universal constant used in the heuristic formula of the optimal parameter gamma. `K2` universal constant used in the heuristic formula of the optimal parameter p.

## Details

If given.kern = FALSE, X is a dataset matrix whose first ref.n rows corresponds to the reference dataset of non-outliers. The (n - ref.n) other observations will be tested one by one by RaPKod to determine whether they are outliers or not.

If given.kern = TRUE, X must be a n x n Gram matrix. The kernel used to compute this Gram matrix should be of the form k(x, y) = K(gamma * || x - y ||^2) where K is a positive function. Also note that in this case, the parameters gamma and p must be specified by the user.

## Value

 `stats ` a vector of length (n - ref.n) containing the test statistics for each tested observation. `flag ` a vector of length (n - ref.n) indicating which observations have been labelled as an outlier (TRUE in this case). `pv ` a vector of length (n - ref.n) containing p-values for each tested observation. `gamma ` the optimal value of gamma determined by the program (or the value provided by the user if it was user-specified). `p ` the optimal value of p determined by the program (or the value provided by the user if it was user-specified).

## See Also

`od.opt.param`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20``` ```data(iris) ##Define data frame with non-outliers inliers = iris[sample(which(iris\$Species!="setosa"), 100, replace=FALSE), -which(names(iris)=="Species")] ##Define data frame with outliers outliers = iris[which(iris\$Species=="setosa"),-which(names(iris)=="Species")] X = rbind(inliers, outliers) ref.n = 50 result <- rapkod(X, ref.n = ref.n, use.tested.inlier = FALSE, alpha = 0.05) ##False alarm error ratio obtained on tested non-outliers (should be close to 0.05) mean(result\$pv[1:(nrow(inliers)-ref.n)]<0.05, na.rm = TRUE) ##Missed detection error ratio obtained on tested outliers (should be close to 0) mean(result\$pv[-(1:(nrow(inliers)-ref.n))]>0.05, na.rm = TRUE) ```

RaPKod documentation built on May 2, 2019, 5:58 a.m.