Performs variable selection using hypothesis tests of covariates in high-dimensional

one-way ANOVA for a completely nonparametric regression model.

1 2 3 | ```
npvarselec(X, Y, method = "backward", p = 7, degree.pol = 0,
kernel.type = "epanech", bandwidth = "CV", gridsize = 10,
dim.red = c(1, 10))
``` |

`X` |
matrix with observations, rows corresponding to data points and columns correspond to covariates. |

`Y` |
vector of observed responses. |

`method` |
type of algorithm to run variable selection, options are "backward", "forward" and "forward2". |

`p` |
size of the window W_i. See npmodelcheck for details. |

`degree.pol` |
degree of the polynomial to be used in the local fit. |

`kernel.type` |
kernel type, options are "box", "trun.normal", "gaussian", "epanech", |

`bandwidth` |
bandwidth for the local polynomial fit at each step of the elimination (or selection). Options are: "CV" for leave-one-out cross validation with criterion of minimum MSE to select a unique bandwidth that will be used for all dimensions; "GCV" for Generalized Cross Validation to select a unique bandwidth that will be used for all dimensions; "CV2" for leave-one-out cross validation for each covariate; and "GCV2" for GCV for each covariate. See localpoly.reg. |

`gridsize` |
number of possible bandwidths to be searched in cross-validation. |

`dim.red` |
vector with first element indicating 1 for Sliced Inverse Regression (SIR) and 2 for Supervised Principal Components (SPC); the second element of the vector should be number of slices (if SIR), or number of principal components (if SPC). If 0, no dimension reduction is performed. This is used to moderate the curse of dimensionality in the local polynomial estimation at each step of the elimination (or selection). See npmodelcheck for details. |

Backward elimination is done by removing, at each step, the least significant covariate in the model if its p-value, obtained from the the test npmodelcheck, is not significant according to False Discovery Rate (FDR) corrections (Benjamini and Yekutieli, 2001). The precudere continues until all covariates left have significant p-values based on FDR.

Forward selection is done by adding to the model, at each step, the covariate with the smallest p-value (when tested with all covariates that are already in the model), if when added, every covariate in the model is significant according to FDR corrections.

Forward2 selection as follows: at each step, denote by Z = (Z_1, ..., Z_q) the covariates in the model and by W = (W_1, ..., W_r) the covariates not in the model (note that (Z,W) = X). Let p_j, j = 1,...r, be the maximum of the set of q+1 p-values obtained from testing each the covariates (Z1,...,Z_q,W_j). Add to the model the covariate corresponding to the smallest p_j as long as, when added, all the p-values of the covariates in the model are significant according to FDR corrections.

See also details of npmodelcheck.

`selected` |
selected covariates |

`p_values` |
p-values of the tests of the selected covariates |

Adriano Zanin Zambom <adriano.zambom@gmail.com>

Zambom, A. Z. and Akritas, M. G. (2012). a) Nonparametric Model Checking and Variable Selection. arXiv 1205.6761.

Benjamini, Y. and Yekutieli, D. (2001) The control of false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165-1188.

`npmodelcheck, localpoly.reg, group.npvarselec`

1 2 3 4 5 6 7 8 9 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.