# od.opt.param: Optimal Parameter Values In RaPKod In RaPKod: Random Projection Kernel Outlier Detector

## Description

Uses a heuristic formula to set optimal values for gamma and p.

## Usage

 ```1 2``` ```od.opt.param(X, K1 = 6, K2 = 50, which.estim = "Gauss", RATIO = 0.1, randomize = TRUE, sub.n = floor(nrow(X))) ```

## Arguments

 `X` a data frame or an n x d matrix. `K1` universal constant used in the heuristic formula of the optimal parameter gamma. `K2` universal constant used in the heuristic formula of the optimal parameter p. `which.estim` specifies the estimation method of the parameters: either "Gauss"(default) or "general". `RATIO` optional parameter used in estimation method "Gauss" `randomize` optional parameter used in the estimation method "general". `sub.n` optional parameter used in the estimation method "general" if randomize=TRUE.

## Details

This function uses a heuristic formula to determine the optimal parameter values gamma and p, in the case when a Gaussian kernel is used. This formula is of the form gamma = K1 * |f|_2^{2/(d+2)} * n^{1/(d+2)} and p = ceil(K2 * |f|_2^{2/(d+2)} * n^{2/(d+2)} ), where |f|_2 is the L2-norm of the density function of non-outliers f and ceil(x) denotes the smallest integer larger than x.

Two methods are proposed to estimate |f|_2 and are specified by the argument which.estim: "Gauss" and "general".

If which.estim="Gauss", the estimation is done as though f was a Gaussian density, which yields |f|_2^{2/(d+2)} ) = (4*pi)^{-0.5}*exp(0.5*mean(log(1/ev))), where ev are the covariance eigenvalues of the non-outlier distribution. Note that the eigenvalues smaller than ev[1]*RATIO (where ev[1] is the largest eigenvalue) are discarded to avoid numerical issues.

If which.estim="general", |f|_2 is estimated without any assumption on f. However this method may fail in very high dimensions because of the dimensionality curse, since it relies on an estimation of the derivative of F at 0 where F is the cdf of the pairwise distance between two non-outliers. . Besides, to shorten the computation time, the optional argument 'randomize' can be set as TRUE, so that only a subset of size sub.n of the data is considered to estimate the cdf F.

## Value

 `gamma.opt` optimal value for gamma. `p.opt` optimal value for p. `est.f2.pw` estimation of |f|_2^{2/(d+2)} .

`rapkod`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```data(iris) ##Define data frame with non-outliers inliers = iris[sample(which(iris\$Species!="setosa"), 100, replace=FALSE), -which(names(iris)=="Species")] param <- od.opt.param(inliers) #display optimal gamma param\$gamma.opt #display optimal p param\$p.opt ```