knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE )
This vignette describes the generic use of the selfmade
software to produce valid post-selection inference, or more specifically selective inference, for linear mixed and additive models after any type of variable selection mechanism, which can be repeated in a bootstrap-like manner.
RĂ¼gamer, D., & Greven, S. (2018). Selective inference after likelihood-or test-based model selection in linear models. Statistics & Probability Letters, 140, 7-12.
RĂ¼gamer, D., & Greven, S. (2020). Inference for $L_2$-Boosting. Statistics and Computing, 30(2), 279-289.
gamm4
function of the eponymous R package. Support for models from mgcv
will be added in the future.selection_function
in the following) determining the selection result for which the practioner seeks valid inference statements. In other words, the user has to define a function similar to the function selection_function
defined below, which is deterministic in the sense that for the same input y
the output should also be exactly the same.selection_function
and the original selection given when performing model selection on the original data $y$. This is usually trivial and just a wrapper for the selection_function
.selection_function <- function(y) { # based on any input y of the same dimension as the original response # a model is selected and mapped to an integer value # .... best_model_index <- get_best_model(list_of_models) return(best_model_index) }
Note that the selection_function
should return the original result when called with the original data vector $y$.
original_response
final_model
(after refitting the model with gamm4
if a different package has been used for model selection)selection_function
)check_congruency
) function returning a logical value whether the result of any model call of selection_function
is equivalent to the original model selection resultmocasin
-function providing selective inference as follows:res <- mocasin(mod = final_model, checkFun = check_congruency, this_y = original_response # further options )
gamm4
modelslibrary(selfmade) library(gamm4) set.seed(0) dat <- gamSim(1,n=500,scale=2) ## simulate 4 term additive truth dat$y <- 3 + dat$x0^2 + rnorm(n=500) br <- gamm4(y~ s(x0) + s(x1), data = dat) summary(br$gam) ## summary of gam # do not use any selection # - hence it's not necessary to define selection_function # and the checl_congruency always returns TRUE checkFun <- function(yb) TRUE # calculate selective inference, which, in this case, # except for an approximation error, should be equivalent # to the unconditional inference res <- mocasin(br, this_y = dat$y, checkFun = checkFun, nrlocs = c(0.7,1), nrSamples = 1000, trace = FALSE) # we get very similar results using do.call("rbind", res$selinf)
library(selfmade) library(lme4) library(lmerTest) # use the ham data and use scaled information liking # as response ham$Informed.liking <- scale(ham$Informed.liking) # We first define a function to fit a model based on response # This function is usually not required but can be # specified in order to define the selection_function # as a function of the model instead of as a function # of the response vector modFun <- function(y) { ham$y <- y lmer(y ~ Gender + Information * Product + (1 | Consumer) + (1 | Product), data=ham) } # define the selection_function: # here this is done as function of a model # which, in combination with modFun, can then # be used as function selFun <- function(mod) step(mod, reduce.fixed = FALSE) # define a function which extracts the results # of the selection procedure extractSelFun <- function(this_mod){ this_mod <- attr(this_mod, "model") if(class(this_mod)=="lm") return(attr(this_mod$coefficients, "names")) else return(c(names(fixef(this_mod)), names(getME(this_mod, "theta")))) } # Now we run the initial model selection on the # orginal data, which is a # backward elimination of non-significant effects: (step_result <- selFun(modFun(ham$Informed.liking))) attr(step_result, "model") # Elimination tables for random- and fixed-effect terms: (sel <- extractSelFun(step_result)) # Now we can define the function checking the congruency # with the original selection checkFun <- function(yb){ this_mod <- modFun(yb) setequal( extractSelFun(selFun(this_mod)), sel ) } # and compute valid p-values conditional on the selection # (this takes some time and will produce a lot of warnings) res <- mocasin(attr(step_result, "model"), this_y = ham$Informed.liking, checkFun = checkFun, which = 1:4, nrSamples = 50, trace = FALSE) print(res)
# Run an AIC comparison between different additive models # fitted with mgcv::gam library(selfmade) library(mgcv) library(gamm4) # create data and models set.seed(2) # use enough noise to get a diverse model selection dat <- gamSim(1,n=400,dist="normal",scale=10) b0123 <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat) b123 <- gam(y~s(x1)+s(x2)+s(x3),data=dat) b013 <- gam(y~s(x0)+s(x1)+s(x3),data=dat) # seems that the second model seems to be the most # 'appropriate' one which.min(AIC(b0123, b123, b013)$AIC) # and refit the model with gamm4 b123_gamm4 <- gamm4(y~s(x0)+s(x1)+s(x3),data=dat) # define selection_function selection_function <- function(y) { dat$y <- y list_of_models <- list( gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat), gam(y~s(x1)+s(x2)+s(x3),data=dat), gam(y~s(x0)+s(x1)+s(x3),data=dat) ) # return an integer value which model is best return( which.min(sapply(list_of_models, AIC)) ) } # define the congruency function checkFun <- function(y) selection_function(y)==2 # compute inference res <- mocasin(mod = b123_gamm4, checkFun = checkFun, nrlocs = 3, # test one position of the spline nrSamples = 10) print(res)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.