knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

Flashfm (flexible and shared information fine-mapping) is a package to simultaneously fine-map genetic associations for multiple quantitative traits that are measured in the same study(s) by sharing information between the traits. It is flexible to the inclusion of related individuals in the sample and to missing trait measurements.

Flashfm makes use of summary-level data and requires as input:

This vignette introduces flashfm and gives an illustration of its use - with both forms of input for the reference genotype information - on simulated data of two traits with a shared causal variant. A simulated data set is provided in flashfm.

Flashfm overview

Flashfm shares information between traits by up-weighting joint models that have a shared causal variant between traits, in a similar Bayesian framework to Multinomial Fine-Mapping (MFM) for joint fine-mapping of multiple diseases with shared controls.

To generate posterior support for fine-mapping models, flashfm needs to calculate the Bayes' factor (BF) for all possible model combinations across SNPs and traits. To make this computationally feasible, we derive an expression for the the joint BF that depends on the individual (marginal) trait BFs and terms that depend on the GWAS summary statistics, covariance matrix of the traits, and sample sizes; either a genotype matrix or both a SNP covariance matrix and relative allele frequencies (RAFs) are needed from a reference panel or in-study sample.

As in MFM, rather than reporting posterior probabilities (PPs) for each SNP model, we focus on SNP groups that we construct such that SNPs in the same group are in LD, have a similar effect on traits, and rarely appear together in a trait model. We constuct SNP groups based on the model PPs from single-trait fine-mapping and a separate set of SNP groups based on the model PPs from flashfm. In general, the groups constructed by flashfm tend to be a subset of those from single-trait fine-mapping, indicating that flashfm gives finer resolution that single-trait fine-mapping.

Details of how to run flashfm on your own data are explained through detailed examples of simulated data. We provide four ways to run flashfm, depending on whether or not single-trait fine-mapping results are already available or not.

  1. Single-trait fine-mapping has not yet been run - need GWAS summary statistics for each trait, SNP correlation matrix, reference allele frequency vector, and trait covariance matrix. In one command, run single-trait fine-mapping (using JAM) and flashfm, as well as construct SNP groups for each approach and summarise results by SNPs and by SNP groups.

  2. Single-trait fine-mapping has not yet been run - need GWAS summary statistics for each trait, SNP correlation matrix, reference allele frequency vector, and trait covariance matrix. In one command, run single-trait fine-mapping (using FINEMAP) and flashfm, as well as construct SNP groups for each approach and summarise results by SNPs and by SNP groups.

  3. Single-trait fine-mapping has not yet been run - need GWAS summary statistics for each trait, SNP matrix, and trait covariance matrix. See second example.

  4. Single-trait fine-mapping has been run using FINEMAP - need *config files from FINEMAP for each trait, GWAS summary statistics for each trait, SNP correlation matrix, reference allele frequency vector, and trait covariance matrix. See third (last) example.

Simulation Example

In this simulated data example, we simulate two traits that each have two causal variants, of which one (rs61839660) is shared between traits; trait 1 has second causal variant rs62626317 and trait 2 has second causal variant rs11594656. The trait correlatin is 0.4.

For a region of 345 SNPs in chromosome 10p-6030000-6220000 (GRCh37/hg19), containing IL2RA, we generated a population of 100,000 individuals based on the CEU 1000 Genomes Phase 3 data using HapGen2. We selected a random sample of 2000 from this population and only retained the 334 variants with MAF > 0.005 in this sample. This genotype matrix is available in flashfm as the object X.

Measurements of two traits were generated using the mvrnorm function in R and genotype matrix X. In this example there are no missing measurements, so N=2000 for both traits.

library(flashfm)
head(snpinfo)
covY
ybar
lapply(beta,head)
X[1:5,1:5]
N
head(raf)

SNP correlation matrix and RAF vector provided - run single-trait (JAM) and multi-trait fine-mapping

This example takes seconds to run since the output is saved as an object, fm, in the flashfm package - details follow.

In this example, we make use of a SNP correlation matrix and reference allele frequency (RAF) vector as input to flashfm and use the internal flashfm function for single-trait fine-mappping, JAMexpandedCor.multi. This function is an expanded version of JAM. The function FLASHFMwithJAM provides a wrapper to run single-trait fine-mapping on each trait and then uses these results as input to flashfm. The SNP groups are also constructed within this wrapper function, as well as output from flashfm summarised at both the SNP-level and SNP group level.

NOTE: SNP names must not contain ":" in FLASHFMwithJAM, JAMexpanded.multi, and JAMexpandedCor.multi. So, if SNP names are in the form "19:45301088_T_C" change to "chr19_45301088_T_C". This could be done using the command

paste0("chr", gsub(":", "_", snpnames)

We first need the correlation matrix, which we compute from our genotype matrix X, retaining only the SNPs that are in the raf vector; the raf vector was previously filtered at MAF 0.005.

msnps <- names(raf) # filtered on maf 0.005
corX = cor(X[,msnps])

The command to run is as below, and for convenience the output from this function is saved in the object fm. In order to use JAM for single-trait fine-mapping, the R2BGLiMS library must be loaded, in addition to flashfm.

library(R2BGLiMS)
fm <- FLASHFMwithJAM(beta, corX, raf, ybar, N, save.path="DIRtmp",TOdds=1,covY,cpp=0.99,NCORES=2) 

where arguments are described in detail here, though each object gives an example of format:

Neff(raf, seB, Vy = 1),

where raf is the reference allele frequency vector, seB is the vector of standard errors of the effect estimates in the same order of SNPs in raf, and Vy is the trait variance, which is 1 by default, assuming traits are transformed to normal(0,1).

Our top group models are:

fm$mpp.pp$PPg

Let’s check if our causal variants (in cvs vector) are included in the SNP groups from single and multi-trait fine-mapping and the sizes of the SNP groups

flashfm:::groupIDs.fn(fm$snpGroups[[1]],cvs) # based on FINEMAP SNP groups
flashfm:::groupIDs.fn(fm$snpGroups[[2]],cvs) # based on flashfm SNP groups
fm$snpGroups$group.sizes # first set of SNP groups is from FINEMAP, second set of SNP groups is from flashfm

Notice that the groups for the causal variants that were not shared (cvs[c(2,3)]) are the same size for both methods and that for the shared variant (cvs[1]), flashfm constructs a smaller group than FINEMAP. This illustrates how sharing information between the traits can refine fine-mapping resolution.

SNP correlation matrix and RAF vector provided - run single-trait (FINEMAP) and multi-trait fine-mapping

This example takes seconds to run since the output is saved as an object, fm2, in the flashfm package - details follow.

In this example, we make use of a SNP correlation matrix and reference allele frequency (RAF) vector as input to flashfm and use the FINEMAP software (Benner et al. 2016). The function FLASHFMwithFINEMAP provides a wrapper to run single-trait fine-mapping on each trait and then uses these results as input to flashfm. The SNP groups are also constructed within this wrapper function, as well as output from flashfm summarised at both the SNP-level and SNP group level.

We first need the correlation matrix, which we compute from our genotype matrix X, retaining only the SNPs that are in the raf vector; the raf vector was previously filtered at MAF 0.005.

msnps <- names(raf) # filtered on maf 0.005
corX = cor(X[,msnps])

The command to run is as below, and for convenience the output from this function is saved in the object fm2.

fm2 <- FLASHFMwithFINEMAP(gwas.list,corX,raf,ybar,N,fstub="DIRresults/eg",TOdds=1,covY,cpp=.99,NCORES=2,FMpath="/software/finemap_v1.4_x86_64/finemap_v1.4_x86_64") 

where arguments are described in detail here, though each object gives an example of format:

Neff(raf, seB, Vy = 1),

where raf is the reference allele frequency vector, seB is the vector of standard errors of the effect estimates in the same order of SNPs in raf, and Vy is the trait variance, which is 1 by default, assuming traits are transformed to normal(0,1).

Our top group models are:

lapply(fm2$mpp.pp$PPg,head)

Let’s check if our causal variants (in cvs vector) are included in the SNP groups from single and multi-trait fine-mapping and the sizes of the SNP groups

flashfm:::groupIDs.fn(fm2$snpGroups[[1]],cvs)
flashfm:::groupIDs.fn(fm2$snpGroups[[2]],cvs)
fm2$snpGroups$group.sizes  # first set of SNP groups is from FINEMAP, second set of SNP groups is from flashfm

Notice that the groups for the causal variants that were not shared (cvs[c(2,3)]) are the same size for both methods and that for the shared variant (cvs[1]), flashfm (fm2$snpGroups[[2]]) constructs a smaller group than FINEMAP (fm2$snpGroups[[1]]). This illustrates how sharing information between the traits can refine fine-mapping resolution.

Reference genotype matrix provided

This example takes seconds to run.

In this example, we make use of a reference genotype matrix as input to flashfm and use the internal flashfm function for single-trait fine-mappping, JAMexpanded.multi. This function is an expanded version of JAM, and requires a pathway to the software PLINK.

NOTE: SNP names must not contain ":" in FLASHFMwithJAM, JAMexpanded.multi, and JAMexpandedCor.multi. So, if SNP names are in the form "19:45301088_T_C" change to "chr19_45301088_T_C". This could be done using the command

paste0("chr", gsub(":", "_", snpnames)

For efficiency, we provide the output from JAMexpanded.multi in the object JAMmain.input. The command used to generate it is:

JAMmain.input <- JAMexpanded.multi(beta,X,snpinfo,ybar,diag(covY),N,chr=10,fstub=fstub,mafthr=0.005,path2plink="/software/plink",r2=0.99,save.path=fstubJ,related=FALSE,y=NULL)

In the above command, ftsub and fstubJ are file prefixes, including file path, to save intermediate results. The related logical flag indicates whether or not there are related individuals in the sample. In the simulated data there is no relatedness, so relate=FALSE. If relatedness, then effective sample sizes need to be input for N; the built-in function Neff estimates effective sample size for each trait from the GWAS summary statistics.

Flashfm is then run with the following lines:

ss.stats <- summaryStats(Xmat=TRUE,ybar.all=ybar,main.input=JAMmain.input)
fm.multi <- flashfm(JAMmain.input, TOdds=1,covY,ss.stats,cpp=0.99,maxmod=NULL,fastapprox=FALSE)

and SNP groups for both single-trait fine-mapping and flashfm are constructed by

snpGroups <- makeSNPgroups2(JAMmain.input,fm.multi,is.snpmat=TRUE,r2.minmerge=0.7) 

where the is.snpmat flag=TRUE indicates that the reference genotype matrix was provided and r2.minmerge is the minimum LD between SNPs in different groups for them to merge; r2.minmerge=0.7 means that if a SNP in 1 group has r2>0.7 with a SNP in another group, these 2 groups are merged. The first list of SNP groups were based on single-trait fine-mapping and the second list is based on flashfm.

The next function summarises the fine-mapping results using the SNP groups:

mpp.pp <- PPsummarise(fm.multi, snpGroups, minPP=0.05)

This returns a list of 4 objects: MPP lists trait-specific PP of SNP inclusion in a model; MPPg lists trait-specific PP of SNP group inclusion in a model; PP lists trait-specific model PP; PPg lists trait-specific model PP in terms of SNP group Setting minPP=0.05 restricts output for group models to only those with PP>0.05 for at least one trait.

Our top group models are:

mpp.pp$PPg

Let's check if our causal variants (in cvs vector) are included in the SNP groups from single and multi-trait fine-mapping and the sizes of the SNP groups

flashfm:::groupIDs.fn(snpGroups[[1]],cvs)
flashfm:::groupIDs.fn(snpGroups[[2]],cvs)
snpGroups$group.sizes

Notice that the groups for the causal variants that were not shared (cvs[c(2,3)]) are the same size for both methods and that for the shared variant (cvs[1]), flashfm constructs a smaller group than JAMexpanded.multi. This illustrates how sharing information between the traits can refine fine-mapping resolution.

Reference genotype covariance matrix, RAF vector, and FINEMAP results for each trait provided

This example takes around one minute to run.

Here, we use single-trait fine-mapping results from FINEMAP and show how to prepare it for use in flashfm.

The *.config files from FINEMAP are processed by the lines

FMconfig <- vector("list",2)
for(i in 1:2) FMconfig[[i]] <- read.table(paste0("finemap",i,".config"), header = TRUE, as.is = TRUE, sep = " ")

and saved in the flashfm object FMconfig, which is prepared for flashfm by creating a list of the FINEMAP results for each trait:

modPP.list <- vector("list",2)
for(i in 1:2) {modPP.list[[i]] <- data.frame(FMconfig[[i]][,c("config","prob")]); colnames(modPP.list[[i]]) <- c("snps","PP")}
names(modPP.list) <- names(beta)
lapply(modPP.list,head)

We provide the covariance matrix of the genotypes from X:

msnps <- names(raf) # filtered on maf 0.005
covX = cov(X[,msnps])

However, if the SNP correlation matrix is available, the covariance matrix may be found using the correlation matrix and reference allele frequencies in the function cor2cov. This function is from the MBESS package and has been imported to flashfm; cor2cov is authored by Ken Kelley (University of Notre Dame; (\email{KKelley@@ND.Edu}) and Keke Lai (the \code{MBESS} package), with modifications by Dustin Fife \email{fife.dustin@@gmail.com}.

covX <- cor2cov(corX,sd=sqrt(2*raf*(1-raf)))

N is a vector of sample sizes for traits, in same order as trait effect vectors given in beta. This can be based on data, if unrelated samples, as here. Especially for samples with related individuals, we recommend finding the effective sample size for each trait using the built-in function:

Neff(raf, seB, Vy = 1),

where raf is the reference allele frequency vector, seB is the vector of standard errors of the effect estimates in the same order of SNPs in raf, and Vy is the trait variance, which is 1 by default, assuming traits are transformed to normal(0,1).

Similar lines to the previous example are run, but flashfm.input is needed to prepare input to flashfm, and its output has the same form as JAMexpanded.multi.

FMmain.input <- flashfm.input(modPP.list,beta1.list=beta,Gmat=covX,Nall=N,ybar.all=ybar,is.snpmat=FALSE,raf=raf)
ss.stats <- summaryStats(Xmat=FALSE,ybar.all=ybar,main.input=FMmain.input)
fm.multi <- flashfm(FMmain.input, TOdds=1,covY,ss.stats,cpp=0.99,maxmod=NULL,fastapprox=FALSE)
snpGroups <- makeSNPgroups2(FMmain.input,fm.multi,is.snpmat=FALSE,r2.minmerge=0.6)
mpp.pp <- PPsummarise(fm.multi, snpGroups, minPP=0.05)

Our top group models are:

mpp.pp$PPg

Let's check if our causal variants (in cvs vector) are included in the SNP groups from single and multi-trait fine-mapping and the sizes of the SNP groups

flashfm:::groupIDs.fn(snpGroups[[1]],cvs)
flashfm:::groupIDs.fn(snpGroups[[2]],cvs)
snpGroups$group.sizes

Notice that the groups for the causal variants that were not shared (cvs[c(2,3)]) are the same size for both methods and that for the shared variant (cvs[1]), flashfm constructs a smaller group than FINEMAP. This illustrates how sharing information between the traits can refine fine-mapping resolution.



jennasimit/flashfm documentation built on July 31, 2022, 7:32 p.m.