kfs | R Documentation |
Kernel feature significance for 1- to 6-dimensional data.
kfs(x, H, h, deriv.order=2, gridsize, gridtype, xmin, xmax, supp=3.7,
eval.points, binned, bgridsize, positive=FALSE, adj.positive, w,
verbose=FALSE, signif.level=0.05)
x |
matrix of data values |
H , h |
bandwidth matrix/scalar bandwidth. If these are missing, |
deriv.order |
derivative order (scalar) |
gridsize |
vector of number of grid points |
gridtype |
not yet implemented |
xmin , xmax |
vector of minimum/maximum values for grid |
supp |
effective support for standard normal |
eval.points |
vector or matrix of points at which estimate is evaluated |
binned |
flag for binned estimation |
bgridsize |
vector of binning grid sizes |
positive |
flag if 1-d data are positive. Default is FALSE. |
adj.positive |
adjustment applied to positive 1-d data |
w |
vector of weights. Default is a vector of all ones. |
verbose |
flag to print out progress information. Default is FALSE. |
signif.level |
overall level of significance for hypothesis tests. Default is 0.05. |
Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. Only the latter is currently implemented, and is also known as significant modal regions.
The hypothesis test at a grid point \bold{x}
is
H_0(\bold{x}): \mathsf{H} f(\bold{x}) < 0
,
i.e. the density Hessian matrix \mathsf{H} f(\bold{x})
is negative definite.
The p
-values are computed for each \bold{x}
using that
the test statistic is
approximately chi-squared distributed with d(d+1)/2
d.f.
We then use a Hochberg-type simultaneous testing procedure, based on the
ordered p
-values, to control the
overall level of significance to be signif.level
. If
H_0(\bold{x})
is rejected then \bold{x}
belongs to a significant modal region.
The computations are based on kdde(x, deriv.order=2)
so
kfs
inherits its behaviour from kdde
.
If the bandwidth H
is missing, then
the default bandwidth is the plug-in selector
Hpi(,deriv.order=2)
. Likewise for missing h
.
The effective support, binning, grid size, grid range, positive
parameters are the same as kde
.
This function is similar to the featureSignif
function in the
feature package, except that it accepts unconstrained bandwidth
matrices.
A kernel feature significance estimate is an object of class
kfs
which is a list with fields
x |
data points - same as input |
eval.points |
vector or list of points at which the estimate is evaluated |
estimate |
binary matrix for significant feature at
|
h |
scalar bandwidth (1-d only) |
H |
bandwidth matrix |
gridtype |
"linear" |
gridded |
flag for estimation on a grid |
binned |
flag for binned estimation |
names |
variable names |
w |
vector of weights |
deriv.order |
derivative order (scalar) |
deriv.ind |
martix where each row is a vector of partial derivative indices. |
This is the same structure as a kdde
object, except that
estimate
is a binary matrix rather than real-valued.
Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.
Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.
Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.
kdde
, plot.kfs
data(geyser, package="MASS")
geyser.fs <- kfs(geyser$duration, binned=TRUE)
plot(geyser.fs, xlab="duration")
## see example in ? plot.kfs
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.