iboost: Valid Inference for Model-based Gradient Boosting Models

Description Usage Arguments Details Value References Examples

Description

Function computes selective p-values (and confidence intervals) for mboost objects. Currently iboost supports Gaussian family models (L2-Boosting) with linear, group and spline base-learners.

Usage

1
2
3
4
5
iboost(obj, method = c("unifsamp", "impsamp", "analytic", "slice",
  "linesearch", "normalsamp", "normaladjsamp"), vars = NULL,
  varForSampling = NULL, B = 1000, alpha = 0.05, ncore = 1,
  refit.mboost = NULL, Ups = NULL, checkBL = TRUE, vT = NULL,
  computeCI = TRUE, returnSamples = FALSE, which = NULL, ...)

Arguments

obj

mboost object

method

character; if possible, choose custom method. See details.

vars

numeric vector; a single numeric value or vector of numeric values for the variance used in the linear model (preferably the true variance or an estimation from a consistent estimator). If NULL, the empirical response variance is used, which will result in rather conservative inference.

varForSampling

variance used for generate new samples. Defaults to the first entry of vars if not given.

B

numeric; number of samples drawn for inference.

alpha

numeric; significance level for p-value / size of selective interval (1 - alpha).

ncore

numeric; number of cores to use (via mclapply)

refit.mboost

function; this is needed if obj was created by a direct call to mboost_fit. In this case, refit.mboost should be a function of the response, refitting the exact model as given by obj with the response vector.

Ups

list of residual matrix produces by getUpsilons.

checkBL

logical; if TRUE checks whether base-learner only include linear, group and spline base-learner (which are currently supported)

vT

list of test vectors as produced by getTestvector.

computeCI

logical; whether or not to compute selective confidence intervals

returnSamples

logical; whether or not (default = FALSE) to only return the samples produced. Per default, p-values (and intervals) are calculated using the samples using format_iboost_res.

which

numeric; selects only certain base-learner, for which inference is conducted.

...

Further arguments passed to the sampling method.

Details

iboost provides inference for L_2-Boosting models fitted with mboost with linear, group or spline base-learner based on Ruegamer and Greven (2018) when method = unifsamp, Yang et al. (2016) when method = impsamp, Tibshirani et al. (2016) when method = analytic, Loftus and Taylor (2015) when method = slice and two variations of the unifsamp approach when method is normalsamp or normaladjsamp. Only the methods impsamp and slice can be used for testing group effects or whole spline functions.

Value

Returns an object of class iboost or, if length(vars)>1, a list of iboost objects for each variance. An iboost object is a list containing the following items

References

Ruegamer, D. and Greven, S. (2018), Valid Inferece for L2-Boosting, arXiv e-prints arXiv:1805.01852.

Yang, F., Barber, R. F., Jain, P. and Lafferty, J. (2016), Selective inference for group-sparse linear models, Advances in Neural Information Processing Systems, pp. 2469-2477.

Tibshirani, R. J., Taylor, J., Lockhart, R. & Tibshirani, R. (2016), Exact post-selection inference for sequential regression procedures, Journal of the American Statistical Association 111(514), 600-620.

Loftus, J. R. & Taylor, J. E. (2015), Selective inference in regression models with groupsof variables, arXiv e-prints arXiv:1511.01478.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
if(require("mboost")){

set.seed(0)

n <- 200
x1 <- rnorm(n)
x2 <- rnorm(n) + 0.25 * x1
x3 <- rnorm(n)
eta <- 3 * sin(x1) + x2^2
y <- scale(eta + rnorm(n), scale = FALSE)

spline1 <- bbs(x1, knots = 20, df = 4)
knots.x2 <- quantile(x2, c(0.25, 0.5, 0.75))
spline2 <- bbs(x2, knots = knots.x2, df = 4)
spline3 <- bbs(x3, knots = 20, df = 4)

data <- data.frame(y=y, x1=x1, x2=x2, x3=x3)

mod1 <- mboost(y ~ spline1 + spline2 + spline3,
control=boost_control(mstop = 73), offset = 0, 
data = data)

# calculate p-values and intervals for model with 
# fixed stopping iteration:
# this is done with only B = 100 samples for
# demonstrative purposes and should be increased 
# for actual research questions
res <- iboost(mod1, method = "impsamp", B = 100)

# do the same with crossvalidation
## Not run: 

fixFolds <- cv(weights = model.weights(mod1),
type = "kfold", B = 10)
cvr <- cvrisk(mod1, folds = fixFolds, papply = lapply)
modf <- mod1[mstop(cvr)]

# define corresponding refit function
modFun <- function(y){

 mod <- mboost_fit(response = y,                
                   blg = blList,
                   offset = 0, 
                   control = boost_control(mstop = 73))
 cvr <- cvrisk(mod, folds = fixFolds, papply = lapply)
 return(mod[mstop(cvr)])
 }
 
 # this will take a while
(res <- iboost(modf, refit.mboost = modFun, method = "impsamp", B = 1000))


## End(Not run)
}

davidruegamer/iboost documentation built on May 14, 2019, 3:10 a.m.