Vignette illustrating the use of graper in linear regression


Make example data with four groups

Create an example data set with 4 groups, 400 train + 100 test samples and 800 features.

data <- makeExampleData(n = 500, p=800, g=4,
                        pis=c(0.05, 0.1, 0.05, 0.1),
                        gammas=c(0.1, 0.1, 10, 10))
# training data set
Xtrain <- data$X[1:400, ]
ytrain <- data$y[1:400]

# annotations of features to groups
annot <- data$annot

# test data set
Xtest <- data$X[401:500, ]
ytest <- data$y[401:500]

Fit the model

graper is the main function of this package, which allows to fit the proposed Bayesian models with different settings on the prior (by setting spikeslab to FALSE or TRUE) and the variational approximation (by setting factoriseQ to FALSE or TRUE). By default, the model is fit with a sparsity promoting spike-and-slab prior and a fully-factorised mean-field assumption. The parameter n_rep can be used to train multiple models with different random initializations. The best model is then chosen in terms of ELBO and returned by the function. th defines the threshold on the ELBO for convergence in the variational Bayes (VB) algorithm used for optimization.

fit <- graper(Xtrain, ytrain, annot,
            n_rep=3, verbose=FALSE, th=0.001)

Training diagnostics

The ELBO monitors the convergence during training.


Posterior distribtions

The variational Bayes (VB) approach directly yields posterior distributions for each parameter. Note, however, that using VB these are often too concentrated and cannot be directly used for construction of confidence intervals etc. However, they can provide good point estimates.

plotPosterior(fit, "gamma", gamma0=data$gammas, range=c(0, 20))
plotPosterior(fit, "pi", pi0=data$pis)

Model coefficients and intercept

The estimated coefficients and the intercept are contained in the result list.

# get coefficients (without the intercept)
beta <- coef(fit, include_intercept=FALSE)
# beta <- fit$EW_beta

# plot estimated versus true beta
qplot(beta, data$beta) +
    coord_fixed() + theme_bw()
# get intercept
intercept <- fit$intercept

Posterior inclusion probabilities per feature

The estimated posterior inclusion probabilities per feature are contained in the result list and can also be accessed using getPIPs

# get estimated posterior inclusion probabilities per feature
pips <- getPIPs(fit)

# plot pips for zero versus non-zero features
df <- data.frame(pips = pips,
                nonzero = data$beta != 0)
ggplot(df, aes(x=nonzero, y=pips, col=nonzero)) +
    geom_jitter(height=0, width=0.2) +
    theme_bw() + ylab("Posterior inclusion probability")

Group-wise penalites

The function plotGroupPenalties can be used to plot the penalty factors and sparsity levels inferred for each feature group.


Make predictions

The function predict can be used to make prediction on new data. Here, we illustrate its use by predicting the response on the test data defined above.

preds <- predict(fit, Xtest)
qplot(preds, ytest) +
    coord_fixed() + theme_bw()



Try the graper package in your browser

Any scripts or data that you put into this service are public.

graper documentation built on Nov. 8, 2020, 5:45 p.m.