This package performs Bayesian predictors and covariance selection for combinations of continuous and/or discrete responses using a recently proposed Bayesian Variable Selection (BVS) algorithm for Gaussian Copula Regression (GCR) models (Alexopoulos and Bottolo, 2021).
The algorithm has been updated to include Rcpp functions to sample decomposable graphs. The execution time is now comparable to C++ R-package BDgraph (Mohammadi and Wit, 2019) to sample non-decomposable graphs.
The current version of the BVS4GCR R-package allows the analysis of (continuous) Gaussian and (discrete) binary, ordinal categorical and count responses (modelled with a binomial or negative binomial distribution) as well as the specification of a decomposable or non-decomposable graphical model for the conditional dependence structure between the responses. The BVS4GCR model also allows for covariates and confounders to be included which are not subject to variable selection and always included in the model. The current version of the package can also analyse data with missing at random in both continuous and discrete responses.
The algorithm returns the posterior samples of the non-zero regression coefficients, the visited graphs as well as the residual covariance matrix between the responses and the response-specific parameters, i.e., the cut-points of ordinal categorical responses or the over-dispersion parameter for count data modelled with a negative binomial distribution. A new routine has been added to return the posterior mean of the quantities of interest.
To favour the comparison with other (linear and non-linear)Bayesian Variable Selection algorithms for single-response regression, BVS4GCR R-package also includes a function to simulate the examples presented in Alexopoulos and Bottolo, 2021
BVS4GCR has several dependencies BDgraph, MASS, MCMCpack, mvnfast, mvtnorm and truncnorm that are automatically imported and installed.
The installation of BVS4GCR requires the following steps:
Install the devtools package. This can be done from CRAN. Invoke R and then type
install.packages("devtools")
Load the devtools package
library("devtools")
Install BVS4GCR package by typing
devtools::install_github("lb664/BVS4GCR")
Finally, load the BVS4GCR package
library("BVS4GCR")
Installation can be checked by running the following toy example with Gaussian responses. Data is simulated by using the following command. For reproducibility, the seed for the simulation can be specified directly as an argument of the function
responseType <- rep("Gaussian", 4)
d <- Sim_GCRdata(m = 4, n = 500, p = 100, pFixed = 2, responseType =
responseType, extras = list(rhoY = 0.8, rhoX = 0.7,
fixedEffect = rep(1, 2), muEffect = rep(3, 4), varEffect =
rep(0.5, 4), s1Lev = 0.05, s2Lev = 0.95, sdResponse =
rep(1, 4)), seed = 28061971)
The function Sim_GCRdata
generates 500 observations for four Gaussian responses with 100 predictors and two covariates common to all responses (including the intercept). The underlying correlation between responses and collinearity amongst the predictors is controlled by the parameters rhoY
and rhoX
, respectively, and the level of sparsity by s1Lev
and s2Lev
. For details regarding these parameters, see Rothman et al. (2010).
BVS4GCR can be run on the simulated data as follows
output <- BVS4GCR(d$Y, d$X, pFixed = 2, responseType = responseType,
EVgamma = c(5, 3^2), niter = 250, burnin = 50, thin = 4,
monitor = 50, seed = 280610971)
where (250 - 50) posterior samples of the quantities of interest are recorded at every four MCMC iterations after burn-in, with the output of the algorithm monitored every 50 iterations. The a priori sparsity prior c(5, 3^2)
implies a range of associated predictors between 0 and (5 + 3 * 3) for each response. For reproducibility, the seed for the Markov chain Monte Carlo can be specified directly as an argument of the function.
This example is similar to the first simulation experiment presented in Alexopoulos and Bottolo, 2021 where 50 observations are simulated from a combination of one Gaussian, one binary and two ordinal categorical responses
responseType <- c("Gaussian", "binary", "ordinal", "ordinal")
d <- Sim_GCRdata(m = 4, n = 50, p = 30, pFixed = 1, responseType =
responseType, extras = list(rhoY = 0.8, rhoX = 0.7,
fixedEffect = 1, muEffect = c(1, rep(0.5, 3)), varEffect =
c(1, rep(0.2, 3)), s1Lev = 0.15, s2Lev = 0.95, sdResponse =
rep(1, 4), nCat = c(3, 4), cutPoint =
cbind(c(0.5, NA, NA), c(0.5, 1.2, 2))), seed = 28061971)
The output d
contains the (50 x 4)-response matrix d$Y
, 30 collinear predictors with an intercept stored in the matrix d$X
as well as d$R
, the simulated correlation between the responses.
BVS4GCR can be run using the command
output <- BVS4GCR(d$Y, d$X, pFixed = 1, responseType = responseType,
EVgamma = c(5, 3^2), niter = 250, burnin = 50, thin = 4,
monitor = 50, seed = 280610971)
Inference with a 'dense' covariance matrix can be specified in the command line as follows
output <- BVS4GCR(d$Y, d$X, pFixed = 1, responseType = responseType,
EVgamma = c(5, 3^2), niter = 250, burnin = 50, thin = 4,
monitor = 50, fullCov = TRUE, seed = 280610971)
This example coincides with the second simulated experiment presented in Alexopoulos and Bottolo, 2021 where 100 observations are simulated from a combination of one Gaussian, one binomial and two negative binomial distributions, with 100 predictors, an intercept and a common covariate for all responses
responseType <- c("Gaussian", "binomial", "negative binomial",
"negative binomial")
d <- Sim_GCRdata(m = 4, n = 100, p = 100, pFixed = 2, responseType =
responseType, extras = list(rhoY = 0.8, rhoX = 0.7,
fixedEffect = c(-0.5, 0.5), muEffect = c(1, rep(0.5, 3)),
varEffect = c(1, rep(0.2, 3)), s1Lev = 0.05, s2Lev = 0.95,
sdResponse = rep(1, 4), nTrial = 10, negBinomPar =
c(0.5, 0.75)), seed = 28061971)
BVS is performed with BVS4GCR by typing
output <- BVS4GCR(d$Y, d$X, pFixed = 2, responseType = responseType,
EVgamma = c(5, 3 ^2), niter = 250, burnin = 50, thin = 4,
monitor = 50, nTrial = 10, negBinomParInit = rep(1, 2),
seed = 280610971)
The argument nondecomp
allows to model the conditional dependence pattern between the responses with non-decomposable graphs
output <- BVS4GCR(d$Y, d$X, pFixed = 2, responseType = responseType,
EVgamma = c(5, 3 ^2), niter = 250, burnin = 50, thin = 4,
monitor = 50, nonDecomp = TRUE, nTrial = 10,
negBinomParInit = rep(1, 2), seed = 280610971)
This software uses the GPL v2 license, see LICENSE. Authors and copyright are provided in DESCRIPTION
To report an issue, please use the BVS4GCR issue tracker at BugReports
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.