knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The ordinalbayes
R package was developed for fitting ordinal Bayesian models when there is a high-dimensional covariate space, such as when high-throughput genomic data are used in modeling the ordinal outcome. This package depends on the runjags
R package and JAGS (version >=4.x.x) must be installed as well. See the JAGS and [runjags] (https://CRAN.R-project.org/package=runjags) for installation instructions. The package includes the function ordinalbayes
which can be used to fit LASSO (model = "lasso"
), normal spike-and-slab (model = "normalss"
), double exponential spike-and-slab (model = "dess"
), and regression-based variable inclusion indicator Bayesian models (model = "regressvi"
). Variable selection can be performed using Bayes factor or using the posterior distributions of the variable inclusion indicators directly. This vignette describes the syntax required for each of our Bayesian models.
library("ordinalbayes")
The package includes a two subsets of The Cancer Genome Atlas Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (TCGA-CESC) dataset: finalSet
includes 2,009 transcripts while reducedSet
includes 41 transcripts. Both datasets include the same set of subjects who belong to FIGO stages I ($N=124$), II ($N=61$), and III-IV ($N=57$). Additionally, the cesc
data.frame is an object that combines the phenotypic and gene expression data into one object. To shorten run time, all illustrations will use cesc
.
head(cesc)
The primary function for model fitting in the ordinalbayes
package is ordinalbayes
. The function arguments are
args(ordinalbayes)
The ordinalbayes
function accepts a model formula
that specifies the ordinal outcome on the left-hand side of the equation and any unpenalized predictor variable(s) from the phenotypic dataset on the right-hand side of the $\sim$ equation; if no unpenalized predictor variables are included, the model formula includes 1 (the intercept) on the right-hand side of the equation. Unpenalized predictors are those that we want to coerce into the model (e.g., age) so that no penalty is applied. When unpenalized predictors are included (or coerced) into the model, the user can specify the variance associated with those model parameters (default coerce.var=10
).
For example, this call fits a regression-based variable inclusion indicator Bayesian model to predict the ordinal outcome Stage
where cigarettes_per_day+age_at_index
and age_at_index
are included as unpenalized predictors (coerced into the model) and the expression of 41 genes are included as penalized predictors. The user should pass to x
the genomic feature data (e.g., expression of genes from high-throughput assays) to be penalized in the fitted model, which are in columns 5-45 of the cesc
data.frame. Here a fixed constant prior for $\pi_j$ is set to 0.05. To shorten run time for demonstration purposes, we reduced the number of iterations for adaptation (adaptSteps
), the number of iterations of the Markov chain to run (burnInSteps
), and the number of saved steps per chain (numSavedSteps
) for all examples.
fit<-ordinalbayes(Stage~cigarettes_per_day+age_at_index, data=cesc,x=cesc[, 5:45], model="regressvi", gamma.ind="fixed", pi.fixed=0.05, adaptSteps=500, burnInSteps=500, numSavedSteps=999)
By default the genomic features are centered (center=TRUE
) and scaled (scale=TRUE
) and three chains are run (nChains
). The user can subset
the data set prior to model fitting, for example, subset=(race=="white")
.
The LASSO Bayesian ordinal model can be fit by specifying model="lasso"
which assumes the penalized coefficients $\beta_j$ for $j=1,\ldots,P$ are from independent Laplace (or double exponential) distributions with parameter $\lambda$ which is from a Gamma distribution with parameters a
and b
. The default parameters are a=0.01
and b=0.01
.
fit.lasso<-ordinalbayes(Stage~cigarettes_per_day+age_at_index, data=cesc,x=cesc[, 5:45], model="lasso", adaptSteps=500, burnInSteps=500, numSavedSteps=999)
Like the LASSO model, the regression-based variable inclusion indicator model assumes the penalized coefficients $\beta_j$ for $j=1,\ldots,P$ are from independent Laplace (or double exponential) distributions with parameter $\lambda$ which is from a Gamma distribution with parameters a
and b
. Additionally, a variable inclusion indicator $\gamma_j$ is assumed to follow a Bernoulli distribution with parameter $\pi_j$. The user can use either a fixed (gamma.ind="fixed"
) or random (gamma.ind="random"
) prior for $\pi_j$. When gamma.ind="fixed"
, the user can specify pi.fixed
as the constant prior to be some value the (0, 1) interval (default is 0.05). Here there are no unpenalized covariates included in the model so the right-hand side of the model formula is 1.
fit.regressvi.fixed<-ordinalbayes(Stage~1, data=cesc,x=cesc[, 5:45], model="regressvi", gamma.ind="fixed", pi.fixed=0.05, adaptSteps=500, burnInSteps=500, numSavedSteps=999)
When gamma.ind="random"
, the user must specify parameter values for the Beta distribution c.gamma
(e.g., 0.01) and d.gamma
(e.g. 0.19).
fit.regressvi.random<-ordinalbayes(Stage~1, data=cesc,x=cesc[, 5:45], model="regressvi", gamma.ind="random", c.gamma=0.01, d.gamma=0.19, adaptSteps=500, burnInSteps=500, numSavedSteps=999)
The normal spike-and-slab Bayesian ordinal model can be fit by specifying model="normalss"
. When fitting this model the user is required to specify the variance for the spike by setting sigma2.0
to a small positive value (e.g., 0.01) and variance for the slab by setting sigma2.1
to a large positive value (e.g., 10). Additionally, a variable inclusion indicator $\gamma_j$ is assumed to follow a Bernoulli distribution with parameter $\pi_j$. The user can use either a fixed (gamma.ind="fixed"
) or random (gamma.ind="random"
) prior for $\pi_j$. When gamma.ind="fixed"
, the user can specify pi.fixed
as the constant prior to be some value the (0, 1) interval (default is 0.05). Here there are no unpenalized covariates included in the model so the right-hand side of the model formula is 1.
fit.normalss.fixed<-ordinalbayes(Stage~1, data=cesc,x=cesc[, 5:45], model="normalss", gamma.ind="fixed", pi.fixed = 0.05, sigma2.0=0.01, sigma2.1=10, adaptSteps=500, burnInSteps=500, numSavedSteps=999)
When gamma.ind="random"
, the user must specify parameter values for the Beta distribution c.gamma
(e.g., 0.01) and d.gamma
(e.g. 0.19).
fitted.normalss.random<-ordinalbayes(Stage~1, data=cesc,x=cesc[, 5:45], model="normalss", gamma.ind="random", c.gamma = 0.01, d.gamma=0.19, sigma2.0=0.01, sigma2.1=10, adaptSteps=500, burnInSteps=500, numSavedSteps=999)
The double exponential spike-and-slab ordinal model can be fit by specifying model="dess"
. Like LASSO and , the slab is taken to be a double exponential distribution with parameter $\lambda$ which follows a Gamma distribution with parameters a
and b
. When fitting this model the user is required to specify the parameter for the spike ($\lambda_0$) using lambda0
(e.g., 20). Additionally, a variable inclusion indicator $\gamma_j$ is assumed to follow a Bernoulli distribution with parameter $\pi_j$. The user can use either a fixed (gamma.ind="fixed"
) or random (gamma.ind="random"
) prior for $\pi_j$. When gamma.ind="fixed"
, the user can specify pi.fixed
as the constant prior to be some value the (0, 1) interval (default is 0.05). Here there are no unpenalized covariates included in the model so the right-hand side of the model formula is 1.
fit.dess.fixed<-ordinalbayes(Stage~1, data=cesc,x=cesc[, 5:45], model="dess", gamma.ind="fixed", pi.fixed = 0.05, lambda0=20, adaptSteps=500, burnInSteps=500, numSavedSteps=999)
When gamma.ind="random"
, the user must specify parameter values for the Beta distribution c.gamma
(e.g., 0.01) and d.gamma
(e.g. 0.19).
fit.dess.random<-ordinalbayes(Stage~1, data=cesc,x=cesc[, 5:45], model="dess", gamma.ind="random", c.gamma = 0.01, d.gamma=0.19, lambda0=20, adaptSteps=500, burnInSteps=500, numSavedSteps=999)
Generic functions for the resulting ordinalbayes
object are available for extracting meaningful results from the resulting MCMC chain. The print
function returns several summaries from the MCMC output for each parameter monitored including: the 95th lower confidence limit for the highest posterior density (HPD) credible interval (Lower95), the median value (Median), the 95th upper confidence limit for the HPD credible interval (Upper95), the mean value (Mean), the sample standard deviation (SD), the mode of the variable (Mode), the Monte Carlo standard error (MCerr,) percent of SD due to MCMC (MC\%ofSD), effective sample size (SSeff), autocorrelation at a lag of 30 (AC.30), and the potential scale reduction factor (psrf).
print(fit)
The summary
function provides the following output:
alphamatrix
, the MCMC output for the threshold parameters;
betamatrix
, the MCMC output for the penalized parameters; * zetamatrix
, The MCMC output for the unpenalized parameters (if included);
gammamatrix
, the MCMC output for the variable inclusion parameters (not available when model = "lasso"
);
gammamean
, the posterior mean of the variable inclusion indicators (not available when model = "lasso"
);
gamma.BayesFactor
, Bayes factor for the variable inclusion indicators (not available when model = "lasso"
);
Beta.BayesFactor
, Bayes factor for the penalized parameters; and
* lambdamatrix
, the MCMC output for the penalty parameter (not available when model="normalss"
).
summary.fit<-summary(fit) names(summary.fit) head(summary.fit$gammamatrix)
To identify which penalized features using Bayes factor at a given threshold (e.g., 5):
names(which(summary.fit$Beta.BayesFactor>5))
or
names(which(summary.fit$gamma.BayesFactor>5))
Alternatively, a threshold for $\bar{\gamma}_j$ could be used for variable selection.
names(which(summary.fit$gammamean>0.5))
coefficients<-coef(fit) coefficients$gamma[which(summary.fit$gamma.BayesFactor>5)] coefficients$gamma[which(summary.fit$Beta.BayesFactor>5)]
To obtain model predictions,
phat<-predict(fit) table(phat$class, cesc$Stage)
The plot
function provides a trace of the sampled output and optionally the density estimate for each variable in the chain. This function additionally adds the appropriate beta
and gamma
labels for each penalized variable name.
plot(fit)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.