Multiple testing procedure for ordered variable selection

Share:

Description

Performs multiple hypotheses testing for ordered variable selection.

Usage

1
mht.order(data,Y,ordre,var_nonselect,alpha,IT,sigma,showresult)

Arguments

data

Input matrix of dimension n * p; each of the n rows is an observation vector of p variables. The intercept should be included in the first column as (1,...,1). If not, it is added.

Y

Response variable of length n.

ordre

Vector from which the varibles are to be ordered, it can be a partial order. If absent, data is considers to be already ordered; Default is (1,2,..,p).

var_nonselect

Number of variables that don't undergo feature selection. They have to be in the first columns of data. Default is 1, the selection is not performed on the intercept.

alpha

A user supplied type I error sequence. Default is alpha=(0.1,0.05)

IT

Number of simulations in the calculation of the quantile. Default is 10000.

sigma

Value of the variance if it is known; 0 otherwise. Default is 0.

showresult

Logical value. if TRUE, shows the value of the statistics and the estimated quantile at each step of the procedure. Default is TRUE.

Details

The details of the procedure are in 'Multiple hypotheses testing for variable selection; F. Rohart 2011'. If showresult=TRUE, the statistics and quantile are printed through the algorithm. If the statistic is greater than the quantile, the test is rejected (takes the value 1). The procedure stops when the null huypothesis is accepted (all alternative hypotheses are 0).
The statistics to test the null hypotheses are different whether the variance sigma is known.

Value

A 'mht.order' object is returned for which the methods predict, refit and plot are available.

data

A list containing:

  • X - The scaled matrix used in the algorithm, the first column being (1,...,1).

  • Y - the input response vector

  • means.X - Vector of means of the input data matrix.

  • sigma.X - Vector of variances of the input data matrix.

coefficients

Matrix of the estimated coefficients. Each row concerns a specific user level alpha.

residuals

Matrix of the residuals. Each row concerns a specific user level alpha.

relevant_var

Set of the relevant variables. Each row concerns a specific user level alpha

fitted.values

Matrix of the fitted values, each column concerns a specific user level alpha.

kchap

Vector containing the length of the estimated set of relevant variables, for each values of alpha.

quantile

The estimated type I error to be used in the second step of the procedure in order to have a test of level alpha, each column stands for one test. See F.Rohart (2011) for details.

call

The call that has been used.

References

Adaptive tests of linear hypotheses by model selection; Baraud & al 2002
Multiple hypotheses testing for variable selection; F. Rohart 2011

See Also

predict.mht.order, refit.mht.order, plot.mht.order

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
x=matrix(rnorm(100*20),100,20)
beta=c(rep(2,5),rep(0,15))
y=x%*%beta+rnorm(100)

mod.order=mht.order(x,y,ordre=5:1,alpha=c(0.1,0.05))
mod.order

## End(Not run)