Description Usage Arguments Value Author(s) References See Also Examples
Calculate p-values and confidence intervals based on the multi-splitting approach
1 2 3 4 5 6 7 8 9 10 | multi.split(x, y, B = 100, fraction = 0.5, ci = TRUE, ci.level = 0.95,
model.selector = lasso.cv,
classical.fit = lm.pval, classical.ci = lm.ci,
parallel = FALSE, ncores = getOption("mc.cores", 2L),
gamma = seq(ceiling(0.05 * B) / B, 1 - 1 / B, by = 1 / B),
args.model.selector = NULL, args.classical.fit = NULL,
args.classical.ci = NULL,
return.nonaggr = FALSE, return.selmodels = FALSE,
repeat.max = 20,
verbose = FALSE)
|
x |
numeric design matrix (without intercept). |
y |
numeric response vector. |
B |
the number of sample-splits, a positive integer. |
fraction |
a number in (0,1), the fraction of data used at each sample split for the model selection process. The remaining data is used for calculating the p-values. |
ci |
logical indicating if a confidence interval should be calculated for each parameter. |
ci.level |
(if |
model.selector |
a |
classical.fit |
a |
classical.ci |
a |
parallel |
logical indicating if parallelization via
|
ncores |
number of cores used for parallelization as
|
gamma |
vector of gamma-values. In case gamma is a scalar, the value Q_j instead of P_j is being calculated (see reference below). |
args.model.selector |
named |
args.classical.fit |
named |
args.classical.ci |
named |
return.nonaggr |
|
return.selmodels |
|
repeat.max |
positive integer indicating the maximal number of split trials. Should not matter in regular cases, but necessary to prevent infinite loops in borderline cases. |
verbose |
should information be printed out while computing? (logical). |
pval.corr |
Vector of multiple testing corrected p-values. |
gamma.min |
Value of gamma where minimal p-values was attained. |
clusterGroupTest |
Function to perform groupwise tests based on
hierarchical clustering. You can either provide a distance matrix
and clustering method or the output of hierarchical clustering from
the function |
Lukas Meier, Ruben Dezeure, Jacopo Mandozzi
Meinshausen, N., Meier, L. and Bühlmann, P. (2009) P-values for high-dimensional regression. Journal of the American Statistical Association 104, 1671–1681.
Mandozzi, J. and Bühlmann, P. (2015) A sequential rejection testing method for high-dimensional regression with correlated variables. To appear in the International Journal of Biostatistics. Preprint arXiv:1502.03300
lasso.cv
, lasso.firstq
;
lm.pval
, lm.ci
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | n <- 40 # a bit small, to keep example "fast"
p <- 256
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
y <- x[,1] * 2 + x[,2] * 2.5 + rnorm(n)
## Multi-splitting with lasso.firstq as model selector function
## 'q' must be specified
fit.multi <- multi.split(x, y, model.selector = lasso.firstq,
args.model.selector = list(q = 10))
fit.multi
head(fit.multi$pval.corr, 10) ## the first 10 p-values
ci. <- confint(fit.multi)
head(ci.) # the first 6
stopifnot(all.equal(ci.,
with(fit.multi, cbind(lci, uci)), check.attributes=FALSE))
## Use default 'lasso.cv' (slower!!) -- incl cluster group testing:
system.time(fit.m2 <- multi.split(x, y, return.selmodels = TRUE))# 9 sec (on "i7")
head(fit.m2$pval.corr) ## the first 6 p-values
head(confint(fit.m2)) ## the first 6 95% conf.intervals
## Now do clustergroup testing
clGTst <- fit.m2$clusterGroupTest
names(envGT <- environment(clGTst))# about 14
if(!interactive()) # if you are curious (and advanced):
print(ls.str(envGT), max = 0)
stopifnot(identical(clGTst, envGT$clusterGroupTest))
ccc <- clGTst()
str(ccc)
ccc$hh # the clustering
has.1.or.2 <- sapply(ccc$clusters,
function(j.set) any(c(1,2) %in% j.set))
ccc$pval[ has.1.or.2] ## all very small
ccc$pval[!has.1.or.2] ## all 1
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.