Sampling Algorithm for Bayesian Variant Selection Methods

Share:

Description

This function performs a basic MH Sampling algorithm to sample models from the model space when enumeration is not possible. For informative marginal inclusion probabilities the algorithm also performs a basic MCMC algorithm to sample the effects of the predictor-level covariates (alpha).

Usage

1
2
3
sampleBVS(data,forced=NULL,inform=FALSE,cov=NULL,rare=FALSE,mult.regions=FALSE,
          regions=NULL,hap=FALSE,iter=10000,save.iter=0,outfile=NULL,
          status.file=NULL,old.results=NULL)

Arguments

data

an (n x (p+1)) dimensional data frame where the first column corresponds to the response variable that is presented as a factor variable corresponding to an individuals disease status (0|1),and the final p columns are the SNPs of interest each coded as a numeric variable that corresponds to the number of copies of minor alleles (0|1|2)

forced

an optional (n x c) dimensional matrix of c confounding variables that one wishes to adjust the analysis for and that will be forced into every model.

inform

if inform=TRUE corresponds to the iBMU algorithm of Quintana and Conti (Submitted) that incorporates user specified external predictor-level covariates into the variant selection algorithm.

cov

an optional (p x q) dimensional matrix of q predictor-level covariates (needed when inform=TRUE) that the user wishes to incorporate into the estimation of the marginal inclusion probabilities using the iBMU algorithm

rare

if rare=TRUE corresponds to the Bayesian Risk index (BRI) algorithm of Quintana and Conti (2011) that constructs a risk index based on the multiple rare variants within each model. The marginal likelihood of each model is then calculated based on the corresponding risk index.

mult.regions

when rare=TRUE if mult.regions=TRUE then we include multiple region specific risk indices in each model. If mult.regions=FALSE a single risk index is computed for all variants in the model.

regions

if mult.regions=TRUE regions is a p dimensional character or factor vector identifying the user defined region of each variant.

hap

if hap=TRUE we estimate a set of haplotypes from the multiple variants within each model and the marginal likelihood of each model is calculated based on the set of estimated haplotypes.

iter

the number of iterations to run the algorithm.

save.iter

the number of iterations between each checkpoint. A checkpoint file is written every save.iter iterations.

outfile

character string giving the pathname of the checkpoint file to save the output of the algorithm to.

status.file

character string giving the pathname of the file to write the status of the algorithm.

old.results

old output from sampleBVS that has been run for a subset of the total number of iterations that the user wanted to run. if specified the sampling algorithm will start from the last sampled model in old.results. To be used if sampleBVS has been interrupted for some reason.

Details

The algorithm is run for a chosen number of iterations where we randomly add and remove variants from the current model based on a basic MH algorithm. If inform = TRUE we also incorporate a set of predictor-level covariates that are provided by the user and use a MCMC algorithm to sample the effects of the covariates on the marginal inclusion probabilities. Convergence of the algorithm can be determined by running two independent runs of the algorithm with different starting values and examining the marginal Bayes factors for each variant under each independent run.

Value

This function outputs a list of the following values to the file write.out if this file is specified for every save.iter number of iterations:

fitness

A vector of the fitness values (log(Model likelihood) - log(Model Prior)) of each model sampled at each iteration of the algorithm.

logPrM

A vector of the Model Priors of each model sampled at each iteration of the algorithm.

which

A vector identifying the character representation of each model sampled.

coef

If rare=FALSE we report a matrix where each row corresponds to the estimated coefficients for all variables within each model sampled at each iteration of the algorithm. If rare=TRUE we report a vector where each entry corresponds to the estimated coefficient of the risk index (or multiple risk indices if mult.regions=TRUE) corresponding to each enumerated model.

alpha

If inform=FALSE that is simply a vector of 0's. If inform=TRUE we report a matrix where each row corresponds to the estimated effects (alpha's) of each predictor-level covariate for each model sampled at each iteration of the algorithm.

Author(s)

Melanie Quintana <maw27.wilson@gmail.com>

References

Quintana M, Conti D (2011). Incorporating Model Uncertainty in Detecting Rare Variants: The Bayesian Risk Index. Genetic Epidemiology 35:638-649.

Quintana M, Conti D (Submitted). Informing Variable Selection via Bayesian Model Uncertainty.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Rare Variant BRI example  
  ## Load the data for Rare variant example
  data(RareData)

  ## Run algorithm for 100 iterations for rare variant example.
  ## NOTE: Results from a more realistic run with 100K
  ## iterations can be found in data(RareBVS.out).
  RareBVS.out <- sampleBVS(data=RareData,iter=100,rare=TRUE)

  ## Run algorithm for 100 iterations for multiple region rare 
  ## variant example.
  p = dim(RareData)[2]-1
  regions = c(rep("Region1",(p/2)),rep("Region2",(p/2)))
  RareBVS.out <- sampleBVS(data=RareData,iter=100,rare=TRUE,mult.regions=TRUE,regions=regions)

## Informative iBMU Example
  ##Load the data for the informative example
  data(InformData)

  ## Run algorithm for 100 iterations for informative data example.
  ## This run is the basic Bayes model uncertainty algorithm with inform=FALSE 
  ## NOTE: Results from a more realistic run with 100K
  ## iterations can be found in data(InformBVS.NI.out).
  InformBVS.NI.out = sampleBVS(InformData$data,inform=FALSE,iter=100)

  ## Run algorithm for 100 iterations for informative data example.
  ## This run corresponds to the iBMU algorithm with inform=TRUE 
  ## and dichotomous predictor-level covariates indicating the gene of each variant.  
  ## NOTE: Results from a more realistic run with 100K
  ## iterations can be found in data(InformBVS.I.out).
  InformBVS.I.out = sampleBVS(InformData$data,inform=TRUE,
                              cov=as.matrix(InformData$cov),iter=100)