Generalized cross-validation bandwidth selection in nonparametric regression models

Description

From a sample {(Y_i, t_i): i=1,...,n}, this routine computes an optimal bandwidth for estimating m in the regression model

Y_i= m(t_i) + ε_i.

The regression function, m, is a smooth but unknown function. The optimal bandwidth is selected by means of the generalized cross-validation procedure. Kernel smoothing is used.

Usage

1
2
np.gcv(data = data, h.seq=NULL, num.h = 50, estimator = "NW", 
kernel = "quadratic")

Arguments

data

data[, 1] contains the values of the response variable, Y;

data[, 2] contains the values of the explanatory variable, t.

h.seq

sequence of considered bandwidths in the GCV function. If NULL (the default), num.h equidistant values between zero and a quarter of the range of t_i are considered.

num.h

number of values used to build the sequence of considered bandwidths. If h.seq is not NULL, num.h=length(h.seq). Otherwise, the default is 50.

estimator

allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”.

kernel

allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”.

Details

See Craven and Wahba (1979) and Rice (1984).

Value

h.opt

selected value for the bandwidth.

GCV.opt

minimum value of the GCV function.

GCV

vector containing the values of the GCV function for each considered bandwidth.

h.seq

sequence of considered bandwidths in the GCV function.

Author(s)

German Aneiros Perez ganeiros@udc.es

Ana Lopez Cheda ana.lopez.cheda@udc.es

References

Craven, P. and Wahba, G. (1979) Smoothing noisy data with spline functions. Numer. Math. 31, 377-403.

Rice, J. (1984) Bandwidth choice for nonparametric regression. Ann. Statist. 12, 1215-1230.

See Also

Other related functions are: np.est, np.cv, plrm.est, plrm.gcv and plrm.cv.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# EXAMPLE 1: REAL DATA
data <- matrix(10,120,2)
data(barnacles1)
barnacles1 <- as.matrix(barnacles1)
data[,1] <- barnacles1[,1]
data <- diff(data, 12)
data[,2] <- 1:nrow(data)

aux <- np.gcv(data)
aux$h.opt
plot(aux$h.seq, aux$GCV, xlab="h", ylab="GCV", type="l")



# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
m <- function(t) {0.25*t*(1-t)}
f <- m(t)

epsilon <- rnorm(n, 0, 0.01)
y <-  f + epsilon
data_ind <- matrix(c(y,t),nrow=100)

# We apply the function
a <-np.gcv(data_ind)
a$GCV.opt

GCV <- a$GCV
h <- a$h.seq
plot(h, GCV, type="l")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.