Description Usage Arguments Details Value References See Also Examples
Performs multiple hypotheses testing in a linear model
1 |
data |
Input matrix of dimension n * p; each of the n rows is an observation vector of p variables. The intercept should be included in the first column as (1,...,1). If not, it is added. |
Y |
Response variable of length n. |
var_nonselect |
Number of variables that don't undergo feature selection. They have to be in the first columns of |
alpha |
A user supplied type I error sequence. Default is (0.1,0.05). |
sigma |
Value of the variance if it is known; 0 otherwise. Default is 0. |
maxordre |
Number of variables to be ordered. Default is min(n/2-1,p/2-1). |
ordre |
Several possible algorithms to order the variables, ordre=c("bolasso","pval","pval_hd","FR"). "bolasso" uses the dyadic algorithm with the Bolasso technique |
m |
Number of bootstrap iteration of the Lasso. Only used if the algorithm is set to "bolasso". Default is m=100. |
show |
Vector of logical values, show=(showordre,showtest,showresult). Default is (1,0,1). If showordre==TRUE, show the ordered variables at each step of the algorithm. If showtest==TRUE, show the number of regularization parameters tested to show the advancement of the dyadic algorithm. Only use if the algorithm is set to "bolasso". if showresult==TRUE, show the value of the statistics and the estimated quantile at each step of the procedure. |
IT |
Number of simulations for the calculation of the quantile. Default is 1000. |
maxq |
Number of maximum multiple hypotheses testing to perform. Default is log(min(n,p)-1,2). |
mht
is a two-step procedure that performs variable selection in high dimensional linear model. The first step orders the variables taking into account the vector of observations Y
. The second step finds a cut-off between the relevant variables (high rank) and the irrelevant ones (low rank) through multiple hypotheses testing.
The input maxordre is not to be forgotten: the more variables to order, the more difficult for the algorithm to distinguish which noisy variable is more important that another noisy variable. It is advised to limit maxordre to p/2
or n/2
if they are large. The parameter maxq can be useful for large value of n
, it is advised to limit it to 5-6 in order to minimize computational time (for the calculation of the quantile).
A 'mht' object is returned for which the methods refit
, predict
and plot
are available.
data |
A list containing:
|
coefficients |
Matrix of the estimated coefficients. Each row concerns a specific user level |
residuals |
Matrix of the residuals. Each row concerns a specific user level |
relevant_var |
Set of the relevant variables. Each row concerns a specific user level |
fitted.values |
Matrix of the fitted values, each column concerns a specific user level |
ordre |
Order obtained on the |
ordrebeta |
The full order on all the variables. |
kchap |
Vector containing the length of the estimated set of relevant variables, for each values of |
quantile |
The estimated quantiles used in the second step of the procedure. |
call |
The call that produced this object. |
Multiple hypotheses testing for variable selection; F. Rohart 2011
predict.mht
, refit.mht
, plot.mht
1 2 3 4 5 6 7 8 9 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.