featureSignif: Feature significance for kernel density estimation

Description Usage Arguments Details Value References See Also Examples

View source: R/featureSignif.R

Description

Identify significant features of kernel density estimates of 1- to 4-dimensional data.

Usage

1
2
featureSignif(x, bw, gridsize, scaleData=FALSE, addSignifGrad=TRUE,
   addSignifCurv=TRUE, signifLevel=0.05)  

Arguments

x

data matrix

bw

vector of bandwidth(s)

gridsize

vector of estimation grid sizes

scaleData

flag for scaling the data i.e. transforming to unit variance for each dimension.

addSignifGrad

flag for computing significant gradient regions

addSignifCurv

flag for computing significant curvature regions

signifLevel

significance level

Details

Feature significance is based on significance testing of the gradient (first derivative) and curvature (second derivative) of a kernel density estimate. This was developed for 1-d data by Chaudhuri & Marron (1995), for 2-d data by Godtliebsen, Marron & Chaudhuri (1999), and for 3-d and 4-d data by Duong, Cowling, Koch & Wand (2007).

The test statistic for gradient testing is at a point x is

W(x) = || hat{grad f}(x; H)||^2

where hat{grad f}(x; H) is kernel estimate of the gradient of f(x) with bandwidth H, and ||.|| is the Euclidean norm. W(x) is approximately chi-squared distributed with d degrees of freedom where d is the dimension of the data.

The analogous test statistic for the curvature is

W2(x) = ||vech hat{curv f}(x; H)||^2

where hat{curv f}(x; H) is the kernel estimate of the curvature of f(x), and vech is the vector-half operator. W2(x) is approximately chi-squared distributed with d(d+1)/2 degrees of freedom.

Since this is a situation with many dependent hypothesis tests, we use the Hochberg multiple comparison testing procedure to control the overall level of significance. See Hochberg (1988) and Duong, Cowling, Koch & Wand (2007).

Value

Returns an object of class fs which is a list with the following fields

x

data matrix

names

name labels used for plotting

bw

vector of bandwidths

fhat

kernel density estimate on a grid

grad

logical grid for significant gradient

curv

logical grid for significant curvature

gradData

logical vector for significant gradient data points

gradDataPoints

significant gradient data points

curvData

logical vector for significant curvature data points

curvDataPoints

significant curvature data points

References

Chaudhuri, P. & Marron, J.S. (1999) SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807-823.

Duong, T., Cowling, A., Koch, I. & Wand, M.P. (2008) Feature significance for multivariate kernel density estimation. Computational Statistics and Data Analysis, 52, 4225-4242.

Hochberg, Y. (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800-802.

Godtliebsen, F., Marron, J.S. & Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics, 11, 1-22.

Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.

See Also

featureSignifGUI, plot.fs

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Univariate example
data(earthquake)
eq3 <- -log10(-earthquake[,3])
fs <- featureSignif(eq3, bw=0.1)
plot(fs, addSignifGradRegion=TRUE)

## Bivariate example
library(MASS)
data(geyser)
fs <- featureSignif(geyser)
plot(fs, addKDE=FALSE, addData=TRUE)  ## data only
plot(fs, addKDE=TRUE)                 ## KDE plot only
plot(fs, addSignifGradRegion=TRUE)    
plot(fs, addKDE=FALSE, addSignifCurvRegion=TRUE)
plot(fs, addSignifCurvData=TRUE, curvCol="cyan")

Example output

Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE 
3: .onUnload failed in unloadNamespace() for 'rgl', details:
  call: fun(...)
  error: object 'rgl_quit' not found 
4: no DISPLAY variable so Tk is not available 

feature documentation built on Feb. 10, 2021, 9:06 a.m.