iboost: Valid Inference for Model-based Gradient Boosting Models
In davidruegamer/iboost: Inference for model-based boosting

Description Usage Arguments Details Value References Examples

Function computes selective p-values (and confidence intervals) for mboost objects. Currently iboost supports Gaussian family models (L2-Boosting) with linear, group and spline base-learners.

iboost(obj, method = c("unifsamp", "impsamp", "analytic", "slice",
  "linesearch", "normalsamp", "normaladjsamp"), vars = NULL,
  varForSampling = NULL, B = 1000, alpha = 0.05, ncore = 1,
  refit.mboost = NULL, Ups = NULL, checkBL = TRUE, vT = NULL,
  computeCI = TRUE, returnSamples = FALSE, which = NULL, ...)

`obj`	mboost object
`method`	character; if possible, choose custom method. See details.
`vars`	numeric vector; a single numeric value or vector of numeric values for the variance used in the linear model (preferably the true variance or an estimation from a consistent estimator). If NULL, the empirical response variance is used, which will result in rather conservative inference.
`varForSampling`	variance used for generate new samples. Defaults to the first entry of `vars` if not given.
`B`	numeric; number of samples drawn for inference.
`alpha`	numeric; significance level for p-value / size of selective interval (`1 - alpha`).
`ncore`	numeric; number of cores to use (via `mclapply`)
`refit.mboost`	function; this is needed if `obj` was created by a direct call to `mboost_fit`. In this case, `refit.mboost` should be a function of the response, refitting the exact model as given by `obj` with the response vector.
`Ups`	list of residual matrix produces by `getUpsilons`.
`checkBL`	logical; if `TRUE` checks whether base-learner only include linear, group and spline base-learner (which are currently supported)
`vT`	list of test vectors as produced by `getTestvector`.
`computeCI`	logical; whether or not to compute selective confidence intervals
`returnSamples`	logical; whether or not (default = FALSE) to only return the samples produced. Per default, p-values (and intervals) are calculated using the samples using `format_iboost_res`.
`which`	numeric; selects only certain base-learner, for which inference is conducted.
`...`	Further arguments passed to the sampling method.

iboost provides inference for L_2-Boosting models fitted with mboost with linear, group or spline base-learner based on Ruegamer and Greven (2018) when method = unifsamp, Yang et al. (2016) when method = impsamp, Tibshirani et al. (2016) when method = analytic, Loftus and Taylor (2015) when method = slice and two variations of the unifsamp approach when method is normalsamp or normaladjsamp. Only the methods impsamp and slice can be used for testing group effects or whole spline functions.

Returns an object of class iboost or, if length(vars)>1, a list of iboost objects for each variance. An iboost object is a list containing the following items

dist: a list obtained by the sampling procedure including rB, the sampled values, logvals, logical values whether the corresponding rB yields to a congruent model with the initial model fit, obsval, the actual observed value in the initial model fit and corresponding weights of the importance sampling procedure.
method: name of the method used
alpha: alpha level used for the confidence interval limits
vT: the test vector(s)
yorg: original response value
resDF: a data.frame consisting of the lower and upper confidence interval limits, the observed value mean, the calculated p-value pval and the truncation limits of the effect lowtrunc and uptrunc.
var: the variance used for inference calculation
dur: total duration of sampling in seconds

Ruegamer, D. and Greven, S. (2018), Valid Inferece for L2-Boosting, arXiv e-prints arXiv:1805.01852.

Yang, F., Barber, R. F., Jain, P. and Lafferty, J. (2016), Selective inference for group-sparse linear models, Advances in Neural Information Processing Systems, pp. 2469-2477.

Tibshirani, R. J., Taylor, J., Lockhart, R. & Tibshirani, R. (2016), Exact post-selection inference for sequential regression procedures, Journal of the American Statistical Association 111(514), 600-620.

Loftus, J. R. & Taylor, J. E. (2015), Selective inference in regression models with groupsof variables, arXiv e-prints arXiv:1511.01478.

if(require("mboost")){

set.seed(0)

n <- 200
x1 <- rnorm(n)
x2 <- rnorm(n) + 0.25 * x1
x3 <- rnorm(n)
eta <- 3 * sin(x1) + x2^2
y <- scale(eta + rnorm(n), scale = FALSE)

spline1 <- bbs(x1, knots = 20, df = 4)
knots.x2 <- quantile(x2, c(0.25, 0.5, 0.75))
spline2 <- bbs(x2, knots = knots.x2, df = 4)
spline3 <- bbs(x3, knots = 20, df = 4)

data <- data.frame(y=y, x1=x1, x2=x2, x3=x3)

mod1 <- mboost(y ~ spline1 + spline2 + spline3,
control=boost_control(mstop = 73), offset = 0, 
data = data)

# calculate p-values and intervals for model with 
# fixed stopping iteration:
# this is done with only B = 100 samples for
# demonstrative purposes and should be increased 
# for actual research questions
res <- iboost(mod1, method = "impsamp", B = 100)

# do the same with crossvalidation
## Not run: 

fixFolds <- cv(weights = model.weights(mod1),
type = "kfold", B = 10)
cvr <- cvrisk(mod1, folds = fixFolds, papply = lapply)
modf <- mod1[mstop(cvr)]

# define corresponding refit function
modFun <- function(y){

 mod <- mboost_fit(response = y,                
                   blg = blList,
                   offset = 0, 
                   control = boost_control(mstop = 73))
 cvr <- cvrisk(mod, folds = fixFolds, papply = lapply)
 return(mod[mstop(cvr)])
 }
 
 # this will take a while
(res <- iboost(modf, refit.mboost = modFun, method = "impsamp", B = 1000))


## End(Not run)
}