Description Usage Arguments Details Value See Also Examples
Uses a heuristic formula to set optimal values for gamma and p.
1 2 | od.opt.param(X, K1 = 6, K2 = 50, which.estim = "Gauss", RATIO = 0.1,
randomize = TRUE, sub.n = floor(nrow(X)))
|
X |
a data frame or an n x d matrix. |
K1 |
universal constant used in the heuristic formula of the optimal parameter gamma. |
K2 |
universal constant used in the heuristic formula of the optimal parameter p. |
which.estim |
specifies the estimation method of the parameters: either "Gauss"(default) or "general". |
RATIO |
optional parameter used in estimation method "Gauss" |
randomize |
optional parameter used in the estimation method "general". |
sub.n |
optional parameter used in the estimation method "general" if randomize=TRUE. |
This function uses a heuristic formula to determine the optimal parameter values gamma and p, in the case when a Gaussian kernel is used. This formula is of the form gamma = K1 * |f|_2^{2/(d+2)} * n^{1/(d+2)} and p = ceil(K2 * |f|_2^{2/(d+2)} * n^{2/(d+2)} ), where |f|_2 is the L2-norm of the density function of non-outliers f and ceil(x) denotes the smallest integer larger than x.
Two methods are proposed to estimate |f|_2 and are specified by the argument which.estim: "Gauss" and "general".
If which.estim="Gauss", the estimation is done as though f was a Gaussian density, which yields |f|_2^{2/(d+2)} ) = (4*pi)^{-0.5}*exp(0.5*mean(log(1/ev))), where ev are the covariance eigenvalues of the non-outlier distribution. Note that the eigenvalues smaller than ev[1]*RATIO (where ev[1] is the largest eigenvalue) are discarded to avoid numerical issues.
If which.estim="general", |f|_2 is estimated without any assumption on f. However this method may fail in very high dimensions because of the dimensionality curse, since it relies on an estimation of the derivative of F at 0 where F is the cdf of the pairwise distance between two non-outliers. . Besides, to shorten the computation time, the optional argument 'randomize' can be set as TRUE, so that only a subset of size sub.n of the data is considered to estimate the cdf F.
gamma.opt |
optimal value for gamma. |
p.opt |
optimal value for p. |
est.f2.pw |
estimation of |f|_2^{2/(d+2)} . |
1 2 3 4 5 6 7 8 9 10 11 12 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.