model.hsstan: hsstan model for cross-validation

View source: R/cv.hsstan.R

model.hsstanR Documentation

hsstan model for cross-validation


This function applies a cross-validation (CV) procedure for training Bayesian models with hierarchical shrinkage priors using the hsstan package. The function allows the option of embedded filtering of predictors for feature selection within the CV loop. Within each training fold, an optional filtering of predictors is performed, followed by fitting of an hsstsan model. Predictions on the testing folds are brought back together and error estimation/ accuracy determined. The default is 10-fold CV. The function is implemented within the nestedcv package. The hsstan models do not require tuning of meta-parameters and therefore only a single CV procedure is needed to evaluate performance. This is implemented using the outer CV procedure in the nestedcv package.


model.hsstan(y, x, unpenalized = NULL, ...)



Response vector. For classification this should be a factor.


Matrix of predictors


Vector of column names x which are always retained into the model (i.e. not penalized). Default NULL means the parameters for all predictors will be drawn from a hierarchical prior distribution, i.e. will be penalized. Note: if filtering of predictors is specified, then the vector of unpenalized predictors should also be passed to the filter function using the filter_options$force_vars argument. Filters currently implementing this option are the partial_ttest_filter for binary outcomes and the lm_filter for continuous outcomes.


Optional arguments passed to hsstan


An object of class hsstan


Athina Spiliopoulou


# Cross-validation is used to apply univariate filtering of predictors.
# only one CV split is needed (outercv) as the Bayesian model does not 
# require learning of meta-parameters.

# load iris dataset and simulate a continuous outcome
dt <- iris[, 1:4]
colnames(dt) <- c("marker1", "marker2", "marker3", "marker4")
dt <-, 2, scale))
dt$outcome.cont <- -3 + 0.5 * dt$marker1 + 2 * dt$marker2 + rnorm(nrow(dt), 0, 2)

# unpenalised covariates: always retain in the prediction model
uvars <- "marker1"
# penalised covariates: coefficients are drawn from hierarchical shrinkage
# prior
pvars <- c("marker2", "marker3", "marker4") # penalised covariates
# run cross-validation with univariate filter and hsstan
# dummy sampling for fast execution of example
# recommend 4 chains, warmup 1000, iter 2000 in practice
oldopt <- options(mc.cores = 2) <- outercv(y = dt$outcome.cont, x = dt[, c(uvars, pvars)],
                         model = model.hsstan,
                         filterFUN = lm_filter,
                         filter_options = list(force_vars = uvars,
                                               nfilter = 2,
                                               p_cutoff = NULL,
                                               rsq_cutoff = 0.9),
                         n_outer_folds = 3, chains = 2,
                         unpenalized = uvars, warmup = 100, iter = 200)
# view prediction performance based on testing folds$summary
# view coefficients for the final model$final_fit
# view covariates selected by the univariate filter$final_vars

# load hsstan package to examine the Bayesian model
print(projsel($final_fit), digits = 4) # adding marker2

# Here adding `marker2` improves the model fit: substantial decrease of
# KL-divergence from the full model to the submodel. Adding `marker3` does 
# not improve the model fit: no decrease of KL-divergence from the full model 
# to the submodel.

nestedcv documentation built on Oct. 23, 2022, 5:06 p.m.