Description Usage Arguments Details Value See Also Examples

RaPKod is a kernel method for detecting outliers in a given dataset on the basis of a reference set of non-outliers. To do so, it 'transforms' a tested observation into some kernel space (through a 'feature map') and then projects it onto a random low-dimensional subspace of this kernel space. Since the distribution of this projection is known in the case of a non-outlier, it allows RaPKod to control the probability of false alarm error (ie labelling a non-outlier as an outlier).

1 2 3 |

`X` |
either a data frame or an n x d matrix (if given.kern=FALSE), otherwise an n x n kernel matrix (if given.kern=TRUE). In the former case, a Gaussian kernel is used by default. |

`given.kern` |
If FALSE (default), each row of X is an observation. Otherwise X is a kernel matrix (in this case, gamma and p must be user-specified). |

`ref.n` |
the size of the reference non-outlier dataset. Must be smaller than n. |

`gamma` |
the hyperparameter of the Gaussian kernel |

`p` |
the number of dimensions of the projection made in the kernel space. Set automatically by the program if not specified and given.kern=FALSE. |

`alpha` |
the prescribed probability of false alarm error. |

`use.tested.inlier` |
If TRUE, each tested observation that is labelled as a non-outlier is appended to the reference dataset of non-outliers (the 'oldest' reference non-outlier is discarded). Set to FALSE by default. |

`lowrank` |
if lowrank="No" (default), the full kernel matrix is used. Otherwise, a low-rank approximation of the kernel matrix is computed: if "Nyst", it is approximated through Nystrom method; if "RKS", it is approximated by random Kitchen Sinks (in this case, X must be a dataset matrix, not a kernel matrix) |

`r.lowrk` |
if lowrank="Nyst" or "RKS", specifies the (low) rank of the approximated kernel matrix. |

`K1` |
universal constant used in the heuristic formula of the optimal parameter gamma. |

`K2` |
universal constant used in the heuristic formula of the optimal parameter p. |

If given.kern = FALSE, X is a dataset matrix whose first ref.n rows corresponds to the reference dataset of non-outliers. The (n - ref.n) other observations will be tested one by one by RaPKod to determine whether they are outliers or not.

If given.kern = TRUE, X must be a n x n Gram matrix. The kernel used to compute this Gram matrix should be of the form *k(x, y) = K(gamma * || x - y ||^2)* where *K* is a positive function. Also note that in this case, the parameters gamma and p must be specified by the user.

`stats ` |
a vector of length (n - ref.n) containing the test statistics for each tested observation. |

`flag ` |
a vector of length (n - ref.n) indicating which observations have been labelled as an outlier (TRUE in this case). |

`pv ` |
a vector of length (n - ref.n) containing p-values for each tested observation. |

`gamma ` |
the optimal value of gamma determined by the program (or the value provided by the user if it was user-specified). |

`p ` |
the optimal value of p determined by the program (or the value provided by the user if it was user-specified). |

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
data(iris)
##Define data frame with non-outliers
inliers = iris[sample(which(iris$Species!="setosa"), 100, replace=FALSE),
-which(names(iris)=="Species")]
##Define data frame with outliers
outliers = iris[which(iris$Species=="setosa"),-which(names(iris)=="Species")]
X = rbind(inliers, outliers)
ref.n = 50
result <- rapkod(X, ref.n = ref.n, use.tested.inlier = FALSE, alpha = 0.05)
##False alarm error ratio obtained on tested non-outliers (should be close to 0.05)
mean(result$pv[1:(nrow(inliers)-ref.n)]<0.05, na.rm = TRUE)
##Missed detection error ratio obtained on tested outliers (should be close to 0)
mean(result$pv[-(1:(nrow(inliers)-ref.n))]>0.05, na.rm = TRUE)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.