The data is randomly split in two halves w.r.t. the observations and
variable selection using Lasso is performed on one half. Whereas the second
half and the selected variables are later used for testing by the function
test_only_hierarchy. This is repeated multiple times.
1 2 3 4
a matrix or list of matrices for multiple data sets. The matrix or matrices have to be of type numeric and are required to have column names / variable names. The rows and the columns represent the observations and the variables, respectively.
a vector, a matrix with one column, or list of the aforementioned
objects for multiple data sets. The vector, vectors, matrix, or matrices
have to be of type numeric. For
a matrix or list of matrices of control variables.
number of sample splits.
proportion of variables to be selected by Lasso in the multi-sample splitting step.
a logical value indicating whether the variables should be standardized.
a character string naming a family of the error distribution;
type of parallel computation to be used. See the 'Details' section.
number of processes to be run in parallel.
an optional parallel or snow cluster used if
a logical value indicating whether the function should
check the input. This argument is used to call
A given data with
nobs is randomly split in two halves w.r.t.
the observations and
nobs * proportion.select variables are selected
using Lasso (implemented in
glmnet) on one half.
Control variables are not penalized if supplied
using the argument
clvar. This is repeated
B times for each
data set if multiple data sets are supplied. Those splits (i.e. second
halves of observations) and corresponding selected variables are used to
perform hierarchical testing by the function
The multi-sample split step can be run in parallel across the different
sample splits (
B corresponds to number of sample splits) by
specifying the arguments
There is an optional argument
parallel = "snow".
There are three possibilities to set the argument
parallel = "no" for serial evaluation (default),
parallel = "multicore" for parallel evaluation
using forking, and
parallel = "snow" for parallel evaluation
using a parallel socket cluster. It is recommended to select
RNGkind("L'Ecuyer-CMRG") and set a seed to ensure that
the parallel computing of the package
hierinf is reproducible.
This way each processor gets a different substream of the pseudo random
number generator stream which makes the results reproducible if the arguments
ncpus) remain unchanged. See the vignette
or the reference for more details.
The returned value is an object of class
of a list with number of elements corresponding to the number of data sets.
Each element (corresponding to a data set
) contains a list with two matrices. The first matrix
contains the indices of the second half of variables (which were not used
to select the variables). The second matrix
contains the column names / variable names of the selected variables.
Renaux, C. et al. (2018), Hierarchical inference for genome-wide association studies: a view on methodology with software. (arXiv:1805.02988)
Meinshausen, N., Meier, L. and Buhlmann, P. (2009), P-values for high-dimensional regression, Journal of the American Statistical Association 104, 1671-1681.
1 2 3 4 5 6 7 8 9 10 11 12
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.