plrm.ancova: Semiparametric analysis of covariance (based on PLR models) In PLRModels: Statistical inference in partial linear regression models

Description

From samples {(Y_{ki}, X_{ki1}, ..., X_{kip}, t_i): i=1,...,n}, k=1,...,L, this routine tests the null hypotheses H0: β_1 = ... = β_L and H0: m_1 = ... = m_L, where:

β_k = (β_{k1},...,β_{kp})

is an unknown vector parameter;

m_k(.)

is a smooth but unknown function and

Y_{ki}= X_{ki1}*β_{k1} +...+ X_{kip}*β_{kp} + m(t_i) + ε_{ki}.

Fixed equally spaced design is considered for the "nonparametric" explanatory variable, t, and the random errors, ε_{ki}, are allowed to be time series. The test statistic used for testing H0: β_1 = ...= β_L derives from the asymptotic normality of an estimator of β_k (k=1,...,L) based on both ordinary least squares and kernel smoothing (this result giving a χ^2-test). The test statistic used for testing H0: m_1 = ...= m_L derives from a Cramer-von-Mises-type functional based on different distances between nonparametric estimators of m_k (k=1,...,L).

Usage

 ```1 2 3 4 5``` ```plrm.ancova(data = data, t = t, b.seq = NULL, h.seq = NULL, w = NULL, estimator = "NW", kernel = "quadratic", time.series = FALSE, Var.Cov.eps = NULL, Tau.eps = NULL, b0 = NULL, h0 = NULL, lag.max = 50, p.max = 3, q.max = 3, ic = "BIC", num.lb = 10, alpha = 0.05) ```

Arguments

 `data` `data[, 1, k]` contains the values of the response variable, Y_k, for each model k (k=1, ..., L); `data[, 2:(p+1), k]` contains the values of the "linear" explanatory variables, X_{k1}, ..., X_{kp}, for each model k (k=1, ..., L). `t` contains the values of the "nonparametric" explanatory (common) variable, t, for each model k (k=1, ..., L). `b.seq` the statistic test for H0: β_1 = ... = β_L is performed using each bandwidth in the vector `b.seq`. If `NULL` (the default) but `h.seq` is not `NULL`, it takes `b.seq=h.seq`. If both `b.seq` and `h.seq` are `NULL`, 10 equidistant values between zero and a quarter of the range of {t_i} are considered. `h.seq` the statistic test for H0: m_1 = ... = m_L is performed using each pair of bandwidths (`b.seq[j], h.seq[j]`). If `NULL` (the default) but `b.seq` is not `NULL`, it takes `h.seq=b.seq`. If both `b.seq` and `h.seq` are `NULL`, 10 equidistant values between zero and a quarter of the range of {t_i} are considered for both `b.seq` and `h.seq`. `w` support interval of the weigth function in the test statistic for H0: m_1 = ... = m_L. If `NULL` (the default), (q_{0.1}, q_{0.9}) is considered, where q_p denotes the quantile of order p of {t_i}. `estimator` allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”. `kernel` allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”. `time.series` it denotes whether the data are independent (FALSE) or if data is a time series (TRUE). The default is FALSE. `Var.Cov.eps` `Var.Cov.eps[, , k]` contains the `n x n` matrix of variances-covariances associated to the random errors of the regression model k (k=1, ..., L). If NULL (the default), the function tries to estimate it: it fits an ARMA model (selected according to an information criterium) to the residuals from the fitted regression model and, then, it obtains the var-cov matrix of such ARMA model. `Tau.eps` `Tau.eps[k]` contains the sum of autocovariances associated to the random errors of the regression model k (k=1, ..., L). If NULL (the default), the function tries to estimate it: it fits an ARMA model (selected according to an information criterium) to the residuals from the fitted regression model and, then, it obtains the sum of the autocovariances of such ARMA model. `b0` if `Var.Cov.eps=NULL` and/or `Tau.eps=NULL`, `b0` contains the pilot bandwidth for the estimator of β_k (k=1, ..., L) used for obtaining the residuals to construct the default for `Var.Cov.eps` and/or `Tau.eps`. If `NULL` (the default) but `h0` is not `NULL`, it takes `b0=h0`. If both `b0` and `h0` are `NULL`, a quarter of the range of {t_i} is considered. `h0` if `Var.Cov.eps=NULL` and/or `Tau.eps=NULL`, (`b0, h0`) contains the pair of pilot bandwidths for the estimator of m_k (k=1, ..., L) used for obtaining the residuals to construct the default for `Var.Cov.eps` and/or `Tau.eps`. If `NULL` (the default) but `b0` is not `NULL`, it takes `h0=b0`. If both `b0` and `h0` are `NULL`, a quarter of the range of {t_i} is considered for both `b0` and `h0`. `lag.max` if `Tau.eps=NULL`, `lag.max` contains the maximum delay used to construct the default for `Tau.eps`. The default is 50. `p.max` if `Var.Cov.eps=NULL` and/or `Tau.eps=NULL`, the ARMA models are selected between the models ARMA(p,q) with 0<=p<=`p.max` and 0<=q<=`q.max`. The default is 3. `q.max` if `Var.Cov.eps=NULL` and/or `Tau.eps=NULL`, the ARMA models are selected between the models ARMA(p,q) with 0<=p<=`p.max` and 0<=q<=`q.max`. The default is 3. `ic` if `Var.Cov.eps=NULL` and/or `Tau.eps=NULL`, `ic` contains the information criterion used to suggest the ARMA models. It allows us to choose between: "AIC", "AICC" or "BIC" (the default). `num.lb` if `Var.Cov.eps=NULL` and/or `Tau.eps=NULL`, it checks the suitability of the selected ARMA models according to the Ljung-Box test and the t-test. It uses up to `num.lb` delays in the Ljung-Box test. The default is 10. `alpha` if `Var.Cov.eps=NULL` and/or `Tau.eps=NULL`, `alpha` contains the significance level which the ARMA models are checked. The default is 0.05.

Details

A weight function (specifically, the indicator function 1_{[w[1] , w[2]]}) is introduced in the test statistic for testing H0: m_1 = ... = m_L to allow elimination (or at least significant reduction) of boundary effects from the estimate of m_k(t_i).

If `Var.Cov.eps=NULL` and the routine is not able to suggest an approximation for `Var.Cov.eps`, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct `Var.Cov.eps`, the procedure suggested in Aneiros-Perez and Vieu (2013) can be followed.

If `Tau.eps=NULL` and the routine is not able to suggest an approximation for `Tau.eps`, it warns the user with a message saying that the model could be not appropriate and then it shows the results. In order to construct `Tau.eps`, the procedures suggested in Aneiros-Perez (2008) can be followed.

Expressions for the implemented statistic tests can be seen in (15) and (16) in Aneiros-Perez (2008).

Value

A list with two dataframes:

 `parametric.test` a dataframe containing the bandwidths, the statistics and the p-values when one tests H0: β_1 = ...= β_L. `nonparametric.test` a dataframe containing the bandwidths b and h, the statistics, the normalised statistics and the p-values when one tests H0: m_1 = ...= m_L.

Moreover, if `data` is a time series and `Tau.eps` or `Var.Cov.eps` are not especified:

 `pv.Box.test` p-values of the Ljung-Box test for the model fitted to the residuals. `pv.t.test` p-values of the t.test for the model fitted to the residuals. `ar.ma` ARMA orders for the model fitted to the residuals.

Author(s)

German Aneiros Perez [email protected]

Ana Lopez Cheda [email protected]

References

Aneiros-Perez, G. (2008) Semi-parametric analysis of covariance under dependence conditions within each group. Aust. N. Z. J. Stat. 50, 97-123.

Aneiros-Perez, G. and Vieu, P. (2013) Testing linearity in semi-parametric functional data analysis. Comput. Stat. 28, 413-434.

Other related functions are `plrm.est`, `par.ancova` and `np.ancova`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78``` ```# EXAMPLE 1: REAL DATA data(barnacles1) data <- as.matrix(barnacles1) data <- diff(data, 12) data <- cbind(data,1:nrow(data)) data(barnacles2) data2 <- as.matrix(barnacles2) data2 <- diff(data2, 12) data2 <- cbind(data2,1:nrow(data2)) data3 <- array(0, c(nrow(data),ncol(data)-1,2)) data3[,,1] <- data[,-4] data3[,,2] <- data2[,-4] t <- data[,4] plrm.ancova(data=data3, t=t) # EXAMPLE 2: SIMULATED DATA ## Example 2a: dependent data - true null hypotheses set.seed(1234) # We generate the data n <- 100 t <- ((1:n)-0.5)/n beta <- c(0.05, 0.01) m1 <- function(t) {0.25*t*(1-t)} f <- m1(t) x1 <- matrix(rnorm(200,0,1), nrow=n) sum1 <- x1%*%beta epsilon1 <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n) y1 <- sum1 + f + epsilon1 data1 <- cbind(y1,x1) x2 <- matrix(rnorm(200,1,2), nrow=n) sum2 <- x2%*%beta epsilon2 <- arima.sim(list(order = c(0,0,1), ma=0.5), sd = 0.02, n = n) y2 <- sum2 + f + epsilon2 data2 <- cbind(y2,x2) data_eq <- array(c(data1,data2), c(n,3,2)) # We apply the tests plrm.ancova(data=data_eq, t=t, time.series=TRUE) ## Example 2b: dependent data - false null hypotheses set.seed(1234) # We generate the data n <- 100 t <- ((1:n)-0.5)/n m3 <- function(t) {0.25*t*(1-t)} m4 <- function(t) {0.25*t*(1-t)*0.75} beta3 <- c(0.05, 0.01) beta4 <- c(0.05, 0.02) x3 <- matrix(rnorm(200,0,1), nrow=n) sum3 <- x3%*%beta3 f3 <- m3(t) epsilon3 <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n) y3 <- sum3 + f3 + epsilon3 data3 <- cbind(y3,x3) x4 <- matrix(rnorm(200,1,2), nrow=n) sum4 <- x4%*%beta4 f4 <- m4(t) epsilon4 <- arima.sim(list(order = c(0,0,1), ma=0.5), sd = 0.02, n = n) y4 <- sum4 + f4 + epsilon4 data4 <- cbind(y4,x4) data_neq <- array(c(data3,data4), c(n,3,2)) # We apply the tests plrm.ancova(data=data_neq, t=t, time.series=TRUE) ```