bmass Introduction -- Simulated Data

This is a very simple, introductory vignette on how to run bmass. In this introductory example, we will use a small, simulated dataset to run bmass and explore some basic output. The dataset is provided with the package.

The purpose of this vignette is to quickly allow users to get up and running with bmass. For a more advanced introductory vignette, where we download and use the GlobalLipids 2013 dataset as an example, please see here.

Running bmass

For this introductory vignette, we will be using bmass_SimulatedData1, bmass_SimulatedData2, and bmass_SimulatedSigSNPs. These are simulated datasets created for the purposes of unit testing and this vignette. To load up 'bmass' and the datasets, run the following:

library("bmass");
data(bmass_SimulatedData1, bmass_SimulatedData2, bmass_SimulatedSigSNPs)
Phenotypes <- c("bmass_SimulatedData1", "bmass_SimulatedData2")

First, we'll take a look at the format of one of our simulated datasets:

head(bmass_SimulatedData1)

Here you should see nine columns: Chr, BP, Marker, MAF, A1, A2, Direction, pValue, N. bmass expects each input dataset (representing a single phenotype GWAS) to contain these 9 columns and with these specific headers. Their descriptions are as follows --

Now to run bmass, all we use is the following code:

bmassResults <- bmass(Phenotypes, bmass_SimulatedSigSNPs);

Note that at its most basic level, we only need to provide two inputs for bmass() to properly run: a vector of strings containing the variable names of each of our univariate summary GWAS datasets and a file containing the list of previously associated univariate genome-wide significant SNPs. The former input, the vector of strings, dictates to bmass() where to find each dataset and what to call each phenotype being analyzed (ie the name of each variable containg the GWAS summary data is expected to correspond to the phenotype represented by that data). The latter input is just a list of each previous GWAS SNP with the following two columns of information, Chr and BP:

bmass_SimulatedSigSNPs

Exploring basic results

Now to begin to get a sense of what bmass() outputs, we can run the following:

summary(bmassResults)

As you can see, there are multiple output variables corresponding to different components of the results. For the purposes of this vignette, we will only touch on the first three variables shown from this summary() output: PreviousSNPs, MarginalSNPS, and NewSNPs. PreviousSNPs refers to information on the input GWAS SNPs we used for the analysis, MarginalSNPs refers to information on the the marginally significant SNPs that were used in the final steps of the analysis, and NewSNPs refers to information on any SNPs that reach multivariate genome-wide levels of significance.

The main result most users will be immediately interested in is whether there are any new variants identified as genome-wide multivariate significant, and this can be determined by accessing the NewSNPs sublist:

summary(bmassResults$NewSNPs)
head(bmassResults$NewSNPs$SNPs)

As you can see in this example, we have 1 new SNP that reaches multivariate genome-wide significance. This is determined by using the smallest log BF weighted average value among the previous univariate significanct GWAS SNPs (PreviousSNPS):

bmassResults$GWASlogBFMinThreshold

Any MarginalSNPs with logBFWeightedAvg greater than this value are designated as new multivariate associations (see eq. 1 in Turchin and Stephens bioRxiv 2019).

For further exploration and details of bmass output, please see the second introductory vignette.



Try the bmass package in your browser

Any scripts or data that you put into this service are public.

bmass documentation built on May 17, 2019, 9:02 a.m.