Description Usage Arguments Value Author(s) See Also Examples
Calls BGLiMS - a Java package for fitting GLMs under Bayesian model selection. NOTE: The predictors to explore with model selection are specified via the model.space.priors argument - see examples. By Default a common, and unknown, prior SD is used for all predictors under model selection, which in turn is assigned an Inverse-Gamma hyper-prior. Fixed normal priors may be user specified for all predictor (and confounder) coefficients via the beta.priors argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | R2BGLiMS(
likelihood = NULL,
data = NULL,
outcome.var = NULL,
times.var = NULL,
confounders = NULL,
model.selection = TRUE,
model.space.priors = NULL,
beta.priors = NULL,
beta.prior.partitions = NULL,
standardise.covariates = TRUE,
empirical.intercept.prior.mean.and.initial.value = NULL,
g.prior = TRUE,
tau = NULL,
xtx.ridge.term = 0,
enumerate.up.to.dim = 0,
X.ref = NULL,
cor.ref = NULL,
mafs.ref = NULL,
ns.each.ethnicity = NULL,
marginal.betas = NULL,
n = NULL,
n.iter = 1e+06,
n.mil.iter = NULL,
thinning.interval = NULL,
seed = NULL,
extra.arguments = NULL,
initial.model = NULL,
max.model.dim = -1,
save.path = NULL,
results.label = NULL,
burnin.fraction = 0.5,
trait.variance = NULL,
logistic.likelihood.weights = NULL,
mrloss.w = 0,
mrloss.function = "variance",
mrloss.marginal.by = NULL,
mrloss.marginal.sy = NULL,
mafs.if.independent = NULL,
extra.java.arguments = NULL,
debug = FALSE
)
|
likelihood |
Type of model to fit. Current options are "Logistic" (for binary data), "CLogLog" complementary log-log link for binary data, "Weibull" (for survival data), "Linear" (for linear regression), "LinearConj" (linear regression exploiting conjugate results), "JAM" (for conjugate linear regression using summary statistics, integrating out parameters) and "JAM_MCMC" (for linear regression using summary statistics, with full MCMC of all parameters). |
data |
Matrix or dataframe containing the data to analyse. Rows are indiviuals, and columns contain the variables and outcome. If modelling summary statistics specify X.ref, marginal.betas, and n instead (see below). |
outcome.var |
Name of outcome variable in data. For survival data see times.var below. If modelling summary statistics with JAM this can be left null but you must specify X.ref, marginal.beats and n instead (see below). |
times.var |
SURVIVAL DATA ONLY Name of column in data which contains the event times. |
confounders |
Optional vector of confounders to fix in the model at all times, i.e. exclude from model selection. |
model.selection |
Whether to use model selection (default is TRUE). NB: Even if set to FALSE, please provide a dummy model.space.priors argument (see below). This will not be used mathematically, but the package requires taking the list of variable names from the "Variables" element therein. |
model.space.priors |
Must be specified if model.selection is set to TRUE. Two options are available. 1) A fixed prior is placed on the proportion of causal covariates, and all models with the same number of covariates is equally likely. This is effectively a Poisson prior over the different possible model sizes. A list must be supplied for 'model.space.priors' with an element "Rate", specifying the prior proportion of causal covariates, and an element "Variables" containing the list of covariates included in the model search. 2) The prior proportion of causal covariates is treated as unknown and given a Beta(a, b) hyper-prior, in which case elements "a" and "b" must be included in the 'model.space.priors' list rather than "Rate". Higher values of "b" relative to "a" will encourage sparsity. NOTE: It is easy to specify different model space priors for different collections of covariates by providing a list of lists, each element of which is itself a model.space.prior list asm described above for a particular subset of the covariates. |
beta.priors |
This allows specifying fixed (potentially informative) priors for the covariate effect priors. A matrix must be passed, with named rows corresponding to parameters, and columns corresponding to the prior mean and variance in that order. When using this option priors must be specified for either just the confounders, which are otherwise given fixed N(0,1e6) priors, or for all covariates. |
beta.prior.partitions |
Covariate effects under variable selection are ascribed, by default, a common Normal prior, the standard deviation of which is treated as unknown, with a Unif(0.05,2) hyper-prior. This option can be used to partition the covariate effects into different prior groups, each with a seperate hierarchical normal prior. beta.prior.partitions must be a list with as many elements as desired covariate groups. The element for a particular group must in turn be a list containing the following named elements: "Variables" - a list of covariates in the prior group, and "UniformA" and "UniformB" the Uniform hyper parameters for the standard deviation of the normal prior across their effects. |
standardise.covariates |
Standardise covariates prior to RJMCMC such that they have a common (unit) standard deviation and mean zero. This is particularly recommended when covariates have substantially different variances, to improve the likelihood of exchangeable effect estimates, an asssumption that is required under the (default) common effect prior with unknown variance used in the RJMCMC. The standardisation is done invisibly to the user; parameter estimates are re-scaled back to unit increases on the original scale before producing posterior summaries. This is currently available for Logistic, Linear and Weibull regression. Default is TRUE. Note that when this option is used, covariate effects are re-scaled in the summary results table contained in the R2BGLiMS results object, such that they are still interpretable as effects corresponding to unit changes on the original covariate scale. |
empirical.intercept.prior.mean.and.initial.value |
Empirically set the intercept inital value and prior mean according to the mean outcome value, i.e. the expected intercept value when covariates are mean-centred. In some settings this can markedly improve mixing but EXPERIMENTATION is encouraged since in other settings it can do more harm than good. For Weibull regression this option leads to both the intercept AND the scale parameter having their prior means and initial values set according to a simple NULL Weibull model fit using the survreg function. In our experience this always improved mixing under Weibull regression, so is enabled by default, but can be over-ridden by setting this option to FALSE. For logistic regression, setting this option to TRUE leads to the intercept prior mean and initial value being set to the logit(case fraction), and for linear regression to the mean outcome. For linear and logistic regression the usefulness of this option is less clear, so it is off by default. |
g.prior |
Whether to use a g-prior for the beta's, i.e. a multivariate normal with correlation structure proportional to sigma^2*X'X^-1, which is thought to aid variable selection in the presence of strong correlation. By default this is enabled. |
tau |
Value to use for sparsity parameter tau (under the tau*sigma^2 parameterisation). When using the g-prior, a recommended default is max(n, P^2) where n is the number of individuals, and P is the number of predictors. |
xtx.ridge.term |
Value to add to the constant of the diagonal of X'X before JAM takes the Cholesky decomposition. |
enumerate.up.to.dim |
Whether to make posterior inference by exhaustively calculating the posterior support for every possible model up to this dimension. Leaving at 0 to disable and use RJMCMC instead. The current maximum allowed value is 5. |
X.ref |
Reference genotype matrix used by JAM to impute the SNP-SNP correlations. If multiple regions are to be analysed this should be a list containing reference genotype matrices for each region. Individual's genotype must be coded as a numeric risk allele count 0/1/2. Non-integer values reflecting imputation may be given. NB: The risk allele coding MUST correspond to that used in marginal.betas. These matrices must each be positive definite and the column names must correspond to the names of the marginal.betas vector. |
cor.ref |
Alternatively to a reference genotype matrix, a reference correlation matrix AND mafs may be supplied to JAM. NB: The risk allele coding MUST correspond to that used in marginal.betas. These matrices must each be positive definite and the column and row names must correspond to the names of the marginal.betas vector. |
mafs.ref |
Alternatively to a reference genotype matrix, a reference correlation matrix AND mafs may be supplied to JAM. NB: The risk allele coding MUST correspond to that used in marginal.betas. This must be a named vector with names correspond to the names of the marginal.betas vector. |
ns.each.ethnicity |
For mJAM: A vector of the sizes of each ethnicity dataset in which the summary statistics were calculated. |
marginal.betas |
Vector of (named) marginal effect estimates to re-analyse with JAM under multivariate models. For multi-ethnic "mJAM" please provide a list of vectors, each element of which is a vector of marginal effects for a specific ethnicity over the same variants. |
n |
The size of the dataset in which the summary statistics (marginal.betas) were calculated |
n.iter |
Number of iterations to run (default is 1e6) |
n.mil.iter |
Number of million iterations to run. Can optionally be used instead of n.iter for convenience, which it will overide if specified. |
thinning.interval |
Every nth iteration to store (i.e. for the Java algorithm to write to a file and then read into R). By default this is the number of iterations divided by 1e4 (so for 1 million iterations every 100th is stored.) Higher values (so smaller posterior sample) can lead to faster runtimes for large numbers of covariates. |
seed |
Which random number seed to use in the RJMCMC sampler. If none is provided, a random seed is picked between 1 and 2^16. |
extra.arguments |
A named list of additional arguments for which there are not currently dedicated options for. This can be used to modify various "under the hood" settings, including all prior hyper-parameters, MCMC mixing parameters such as the probabilities of add/delete/swap moves as well as the adaption settings. A list of all settings available for modification can be seen by typing "data(DefaultArguments)" and then "default.arguments", which lists their names and default values. |
initial.model |
An initial model for the covariates under selection can be specified as a vector of 0s and 1s. If left un-specified the null (empty) model is used. |
max.model.dim |
The maximum model dimension can be specified, therefore truncating the model space. We do not recommend using this option but it might sometimes be useful for robust-ness checks. When left un-specified there is no restriction on the model size. |
save.path |
By default R2BGLiMS writes the posterior samples to a temporary file that is deleted after they have been read in to R. By specifying a file path this can be kept, along with the temporary files used for the data and arguments, which can be useful sometimes for de-bugging. |
results.label |
When using the save.path option, this allows the user to specify a handle with which to name the files. |
burnin.fraction |
Initial fraction of the iterations to throw away, e.g. setting to 0 would mean no burn-in. The default of 0.5 corresponds to the first half of iterations being discarded. |
logistic.likelihood.weights |
An optional vector of likelihood weights for logistic regression. These weights multiply the log-likeihood contribution of each individual. The order should match the order of rows in the data matrix. |
mrloss.w |
The relative weight of the MR log loss function for pleiotropy vs the log likelihood. Default 0. |
mrloss.function |
Choice of pleiotropic loss function from "steve", "variance" (default variance) |
mrloss.marginal.by |
Marginal associations between SNPs and outcome for the MR loss function model. |
mrloss.marginal.sy |
Standard errors of marginal associations between SNPs and outcome for the MR loss function model (not required for mrloss.function "variance") |
mafs.if.independent |
If the SNPs are independent then a reference genotype matrix is not required. However, it is still necessary to provide SNP MAFs here as a named vector. Doing so will lead to X.ref being ignored and the SNPs to be modelled as if they are independent. Note that this option does not work with enumeration. |
extra.java.arguments |
A character string to be passed through to the java command line. E.g. to specify a different temporary directory by passing "-Djava.io.tmpdir=/Temp". |
debug |
Whether to output extra information (such as final adaption proposal SDs) which might help with debugging (default is FALSE). |
An R2BGLiMS_Results class object is returned. See the slot 'posterior.summary.table' for a posterior summary of all parameters. See slot 'mcmc.output' for a matrix containing the raw MCMC output from the saved posterior samples (0 indicates a covariate is excluded from the model in a particular sample. Some functions for summarising results are listed under "see also".
Paul Newcombe
Summary results are stored in the slot posterior.summary.table. See ManhattanPlot
for a visual
summary of covariate selection probabilities. For posterior model space summaries see TopModels
. For
convergence checks see TracePlots
and AutocorrelationPlot
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | library(R2BGLiMS)
# NB: See ?JAM for JAM examples.
#
# Examples below are:
# 1) Logistic regression
# 2) Weibull regression survival analysis
# 3) Case-cohort logistic regression (with hazard ratio estimation using Prenctice weighted Cox regression)
# 4) Linear regression
# 5) Linear regression using a conjugate posterior (with coefficients analytically intergreated out)
# 6) Fixed priors on effects, rather than the default common prior with unknown variance
# 7) Using two model space partitions
##########################################
# --- Example 1) Logistic regression --- #
##########################################
utils::data(biopsy, package = "MASS") # Example logistic dataset
covariate.names <- paste0("V",c(1:9))
# Recommend standardising predictors to justify default common hierarchical prior on effects
for (v in covariate.names) {biopsy[,v] <- scale(biopsy[,v])}
biopsy$class <- as.integer(biopsy$class) - 1
biopsyResults <- R2BGLiMS( # Takes a few minutes to run
likelihood="Logistic",
data=biopsy,
outcome.var="class",
model.space.priors=list("a"=1, "b"=length(covariate.names), "Variables"=covariate.names), # Beta-binomial(1,P) model space prior
extra.arguments=list("AlphaPriorMu"=log(mean(biopsy$class)/(1-mean(biopsy$class)))) # Recommend centering intercept prior on logit(event rate)
)
# Simple convergence diagnostic
plot(biopsyResults@mcmc.output[,"LogLikelihood"], type="l") # Looks ok
# Summary plot of selection probabilities
ManhattanPlot(biopsyResults) # Evidence of several independent effects
# Summary table
biopsyResults@posterior.summary.table
TopModels(biopsyResults)
###########################################################
# --- Example 2) Weibull regression survival analysis --- #
###########################################################
utils::data(VA, package = "MASS")
predictors <- c("treat","age","Karn","diag.time","prior")
for (v in predictors) {VA[,v] <- scale(as.numeric(VA[,v]))} # Normalise predictors
VA$stime <- VA$stime/max(VA$stime)# Recommend scaling survival times to between 0 and 1
va.results.weibull <- R2BGLiMS(
likelihood="Weibull",
data=VA,
outcome.var="status",
times.var="stime",
model.space.priors=list(list("a"=1,"b"=length(predictors),"Variables"=predictors)) # Beta-binomial(1,P) model space prior
)
plot(va.results.weibull@mcmc.output[,"LogLikelihood"], type="l") # Looks ok
ManhattanPlot(va.results.weibull) # Clear signal at Kern
va.results.weibull@posterior.summary.table
TopModels(va.results.weibull)
######################################################
# --- Example 3) Case-cohort logistic regression --- #
######################################################
# --- Step 1: Logistic RJMCMC
data("CaseCohortExample") # Only V1,V2,V3,V4,V5 have true effects.
for (v in covariate.names) { data.cc[,v] <- data.cc[,v] - mean(data.cc[,v]) }
logistic.cc.res <- R2BGLiMS(
likelihood="Logistic",
data=data.cc,
outcome.var="event",
model.space.priors=list(list("a"=1,"b"=length(covariate.names),"Variables"=covariate.names)),
n.mil=0.2,
extra.arguments=list("AlphaPriorMu"=log(mean(data.cc$event)/(1-mean(data.cc$event))))
)
plot(logistic.cc.res@mcmc.output[,"LogLikelihood"], type="l") # Looks ok
ManhattanPlot(logistic.cc.res) # Identifies 4 out of 5 true effects
logistic.cc.res@posterior.summary.table
# --- Step 2: Effect estimation using the Prentice model
library(survival)
top.vars <- names(which(logistic.cc.res@posterior.summary.table[covariate.names,"BF"]>5))
prentice.model.formula <- as.formula(paste("Surv(times, event) ~ ",paste(top.vars, collapse="+")))
prentice.res <- cch(
prentice.model.formula,
data = data.cc,
subcoh = ~subcohort,
id=~ID,
cohort.size=n.complete.cohort,
method="Prentice")
summary(prentice.res)
#######################################################
# --- Example 4) Linear regression with full MCMC --- #
#######################################################
data("LinearModelExample") # True effect at V1 only
lm.results <- R2BGLiMS(
likelihood="Linear",
data=data.cts.outcome,
outcome.var="y",
confounders="confounder", # Example of a variable to include always
model.space.priors = list("a"=1, "b"=length(covariate.names), "Variables"=covariate.names) # Beta-binomial(1,P) prior on model space
)
plot(lm.results@mcmc.output[,"LogLikelihood"], type="l") # Looks ok
ManhattanPlot(lm.results)
lm.results@posterior.summary.table
#############################################################################
# --- Example 5) Linear regression using a conjugate marginal posterior --- #
#############################################################################
# Coefficients are analytically integrated out
# Only models are sampled so mixing should be better
data("LinearModelExample") # True effect at V1 only
lm.conjugate.results <- R2BGLiMS(
likelihood="LinearConj",
data=data.cts.outcome,
outcome.var="y",
confounders="confounder", # Example of a variable to include always
tau=max(nrow(data.cts.outcome),ncol(data.cts.outcome)^2), # Tau recommended to be maximum of P^2 and N
model.space.priors = list("a"=1, "b"=length(covariate.names), "Variables"=covariate.names), # Beta-binomial(1,P)prior on model space
g.prior=T)
# Summary plot of selection probabilities
ManhattanPlot(lm.conjugate.results) # More decisive than above
TopModels(lm.conjugate.results)
# Posterior sample of parameter in top model
posterior.sample <- NormInvGamPosteriorSample(
data=data.cts.outcome, outcome.var ="y", confounders="confounder",
model=c("V1"),tau=100)
# Median and credible interval
round(quantile(posterior.sample[,"V1"],c(0.5, 0.025, 0.975)),2) # Clearly excludes 0
##############################################
# --- Example 6) Fixed priors on effects --- #
##############################################
# By default, a common prior with unkown variance is assumed for all effects
# Might want to explicitly specify these priors instead the priors (e.g. if there is prior information)
data("LinearModelExample") # True effect at V1 only
lm.results.fixed.priors <- R2BGLiMS(
likelihood="Linear",
data=data.cts.outcome,
outcome.var="y",
confounders="confounder", # Example of a variable to include always
beta.priors = data.frame(
cbind(
rep(0,11), # Normal pior means
rep(1,11)), # Normal prior SDs
row.names=c("confounder",covariate.names)),
model.space.priors = list("a"=1, "b"=length(covariate.names), "Variables"=covariate.names) # Beta-binomial(1,P) prior on model space
)
plot(lm.results.fixed.priors@mcmc.output[,"LogLikelihood"], type="l") # Looks ok
ManhattanPlot(lm.results.fixed.priors)
lm.results.fixed.priors@posterior.summary.table
#################################################
# --- Example 7) Two model space partitions --- #
#################################################
# Can provide different levels of prior sparsity to different groups of covariates
# Demonstrated below with two groups
data("LinearModelExample") # True effect at V1 only
lm.results.two.model.space.partitions <- R2BGLiMS(
likelihood="Linear",
data=data.cts.outcome,
outcome.var="y",
confounders="confounder", # Example of a variable to include always
model.space.priors = list(
list("a"=1, "b"=1, "Variables"=covariate.names[1:5]), # Generous 50/50 prior proportion
list("a"=1, "b"=1000, "Variables"=covariate.names[6:10]) # Very sparse!
)
)
plot(lm.results.two.model.space.partitions@mcmc.output[,"LogLikelihood"], type="l") # Looks ok
ManhattanPlot(lm.results.two.model.space.partitions)
lm.results.two.model.space.partitions@posterior.summary.table
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.