Semiparametric estimates for the unknown components of the regression function in PLR models

Share:

Description

This routine computes estimates for β and m(newt_j) (j=1,...,J) from a sample {(Y_i, X_{i1}, ..., X_{ip}, t_i)}: i=1,...,n, where:

β = (β_1,...,β_p)

is an unknown vector parameter,

m(.)

is a smooth but unknown function and

Y_i= X_{i1}*β_1 +...+ X_{ip}*β_p + m(t_i) + ε_i.

The random errors, ε_i, are allowed to be time series. Kernel smoothing, combined with ordinary least squares estimation, is used.

Usage

1
2
plrm.est(data = data, b = NULL, h = NULL, newt = NULL, estimator = "NW", 
kernel = "quadratic")

Arguments

data

data[, 1] contains the values of the response variable, Y;

data[, 2:(p+1)] contains the values of the "linear" explanatory variables,

X_1, ..., X_p;

data[, p+2] contains the values of the "nonparametric" explanatory variable, t.

b

bandwidth for estimating the parametric part of the model. If both b and h are NULL (the default), it is selected by means of the cross-validation procedure (fixing b=h); if b is NULL (the default) but h is not NULL, b=h is considered.

h

(b,h) is the pair of bandwidths for estimating the nonparametric part of the model. If both b and h are NULL (the default), it is selected by means of the cross-validation procedure (fixing b=h); if b is NULL (the default) but h is not NULL, b=h is considered; if h is NULL (the default) but b is not NULL, h=b is considered.

newt

values of the "nonparametric" explanatory variable where the estimator of m is evaluated. If NULL (the default), the considered values will be the values of data[,p+2].

estimator

allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”.

kernel

allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”.

Details

Expressions for the estimators of β and m can be seen in page 52 in Aneiros-Perez et al. (2004).

Value

A list containing:

beta

a vector containing the estimate of β.

m.t

a vector containing the estimator of the non-parametric part, m, evaluated in the design points.

m.newt

a vector containing the estimator of the non-parametric part, m, evaluated in newt.

residuals

a vector containing the residuals: Y - X*beta - m.t.

fitted.values

the values obtained from the expression: X*beta + m.t

b

the considered bandwidth for estimating β.

h

(b,h) is the pair of bandwidths considered for estimating m.

Author(s)

German Aneiros Perez ganeiros@udc.es

Ana Lopez Cheda ana.lopez.cheda@udc.es

References

Aneiros-Perez, G., Gonzalez-Manteiga, W. and Vieu, P. (2004) Estimation and testing in a partial linear regression under long-memory dependence. Bernoulli 10, 49-78.

Hardle, W., Liang, H. and Gao, J. (2000) Partially Linear Models. Physica-Verlag.

Speckman, P. (1988) Kernel smoothing in partial linear models. J. R. Statist. Soc. B 50, 413-436.

See Also

Other related functions are: plrm.beta, plrm.gcv, plrm.cv, np.est, np.gcv and np.cv.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# EXAMPLE 1: REAL DATA
data(barnacles1)
data <- as.matrix(barnacles1)
data <- diff(data, 12)
data <- cbind(data,1:nrow(data))

b.h <- plrm.gcv(data)$bh.opt
ajuste <- plrm.est(data=data, b=b.h[1], h=b.h[2])
ajuste$beta
plot(data[,4], ajuste$m, type="l", xlab="t", ylab="m(t)")

plot(data[,1], ajuste$fitted.values, xlab="y", ylab="y.hat", main="y.hat vs y")
abline(0,1)

mean(ajuste$residuals^2)/var(data[,1])



# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
beta <- c(0.05, 0.01)
m <- function(t) {0.25*t*(1-t)}
f <- m(t)

x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- rnorm(n, 0, 0.01)
y <-  sum + f + epsilon
data_ind <- matrix(c(y,x,t),nrow=100)

# We estimate the components of the PLR model
# (CV bandwidth)
a <- plrm.est(data_ind)

a$beta

est <- a$m.t
plot(t, est, type="l", lty=2, ylab="")
points(t, 0.25*t*(1-t), type="l")
legend(x="topleft", legend = c("m", "m hat"), col=c("black", "black"), lty=c(1,2))


## Example 2b: dependent data
# We generate the data
x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y <-  sum + f + epsilon
data_dep <- matrix(c(y,x,t),nrow=100)

# We estimate the components of the PLR model
# (CV bandwidth)
h <- plrm.cv(data_dep, ln.0=2)$bh.opt[3,1]
a <- plrm.est(data_dep, h=h)

a$beta

est <- a$m.t
plot(t, est, type="l", lty=2, ylab="")
points(t, 0.25*t*(1-t), type="l")
legend(x="topleft", legend = c("m", "m hat"), col=c("black", "black"), lty=c(1,2))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.