hit: Hierarchical Inference Testing

Description Usage Arguments Details References Examples

View source: R/hit.R

Description

Hierarchical inference testing for linear models with high-dimensional and/or correlated covariates by repeated sample splitting.

Usage

1
2
3
4
hit(x, y, hierarchy, family = "gaussian", B = 50, p.samp1 = 0.5,
  nfolds = 10, overall.lambda = FALSE, lambda.opt = "lambda.1se",
  alpha = 1, gamma = seq(0.05, 0.99, length.out = 100), max.p.esti = 1,
  mc.cores = 1L, trace = FALSE, ...)

Arguments

x

Design matrix of dimension n * p, without intercept. Variables not part of the dendrogram are added to the HO-model, see Details below.

y

Quantitative response variable dimension n.

hierarchy

Object of class as.hierarchy. Must include all variables of x which should be tested.

family

Family of response variable distribution. Ether y is "gaussian" or "poisson" in which case y must be a vector or it is "binomial" distibuded and is either a vecror of zeros and ones, factor with two levels, or a two-column matrix of counts or proportions. The second column is treated as the target class. For a factor, the last level in alphabetical order is the target class. For "binomial" if y is presented as a vector, it will be coerced into a factor.

B

Number of sample-splits.

p.samp1

Fraction of data used for the LASSO. The hierachical ANOVA testing uses 1 - p.samp1.

nfolds

Number of folds (default is 10). See cv.glmnet for more details.

overall.lambda

Logical, if true, lambda is estimated once, if false, lambda is estimated for each sample split.

lambda.opt

Criterion for optimum selection of cross-validated lasso. Either "lambda.1se" (default) or "lambda.min". See cv.glmnet for more details.

alpha

A single value in the range of 0 to 1 for the elastic net mixing parameter.

gamma

Vector of gamma-values.

max.p.esti

Maximum alpha level. All p-values above this value are set to one. Small max.p.esti values reduce computing time.

mc.cores

Number of cores for parallelising. Theoretical maximum is 'B'. For details see mclapply.

trace

If TRUE it prints current status of the program.

...

Additional arguments for cv.glmnet.

Details

The H0-model contains variables, with are not tested, like experimental-design variables. These variables are not penalised in the LASSO model selection and are always include in the reduced ANOVA model.

References

Mandozzi, J. and Buehlmann, P. (2013). Hierarchical testing in the high-dimensional setting with correlated variables. To appear in the Journal of the American Statistical Association. Preprint arXiv:1312.5556

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Simulation:
set.seed(123)
n <- 80
p <- 82
## x with correlated columns
corMat <- toeplitz((p:1/p)^2)
corMatQ <- chol(corMat)
x <- matrix(rnorm(n * p), nrow = n) %*% corMatQ
colnames(x) <- paste0("x", 1:p)
## y
mu <- x[, c(5, 10, 72)] %*% c(2, -2, 2)
y <-  rnorm(n, mu)
## clustering of the clumns of x
hc <- hclust(dist(t(x)))

# HIT with AF
out <- hit(x, y, hc)
summary(out)

hit documentation built on May 2, 2019, 10:15 a.m.