cgwas: combine GWAS (C-GWAS)

View source: R/cgwasProtocal.R

cgwasR Documentation

combine GWAS (C-GWAS)

Description

Combining GWAS summary statistics of multiple potentially related traits.

Usage

cgwas(
  gwasFilePath,
  snpFilePath,
  outputPath,
  traitName = basename(gwasFilePath),
  exNa = TRUE,
  mrafFilePath = NULL,
  keepIEb = FALSE,
  threadN = ceiling(parallel::detectCores()/2),
  indSNPN = 1e+06,
  simulDep = 100,
  pStudy = 0.05/indSNPN,
  pSugst = 1/indSNPN,
  pMain = 3/indSNPN,
  ebicoPwrInc = 1.1,
  minCorDiff = 0.05,
  hiCorRestc = 0.5,
  sampleNInc = 0.5,
  twtStCo = 10^(seq(0, 1 - ceiling(log10(indSNPN)), -1/3))[-1],
  LoInrP = c(0.05, 0.001),
  LoSpanV = c(0.03, 0.1, 0.75),
  lociInr = 250000
)

Arguments

gwasFilePath

a vector of strings containing the paths to input GWASs summary statistics files. These files should be in the space or tab delimited format. Each file contains two and only two columns. The first column is the regression betas of all SNPs and the second column is the P-values of all SNPs. These files require a header line with two items: BETA and P. Note that for each SNP, all betas must be based on the same reference allele.

snpFilePath

string indicating the path to input SNP information file, a space delimited file consisting of three columns representing chromosome, base pair and SNP identifier. The file requires a header line with three items CHR, BP and SNP in the specified order.

outputPath

a string indicating the path to output directory. In this directory, two new folders will be created, the Results/ contains main results and the Details/ contains all intermediate results.

traitName

a vector of strings of trait names. The order must be the same as input GWASs. The basename of GWAS files will be set to trait names by default.

exNa

logical. If TRUE, SNPs with missing values in at least one input GWAS will be removed and will not appear in the output files. All removed SNPs will be listed in a file named ExcludedSNP.txt; if FALSE, the NA is replaced with BETA=0 and P=1. Default is TRUE.

mrafFilePath

a string indicating the path to the mean reference allele frequency (MRAF) file, which requires a header MRAF, and contains one column of mean frequency of the reference allele of each SNP. Note that the reference allele must correspond to the beta in the input GWAS. Reference allele frequencies are used to estimate the weights of input GWASs and all intermediate GWASs. This MRAF file is recommended but not obligatory with default NULL. If NULL, weights will still be estimated by approximations.

keepIEb

logical. If TRUE, all intermediate results from i-EbICoW will be saved in Details/i-EbICoW. Default is FALSE.

threadN

number of threads/cores to be used for parallel computing. The default is to automatically detect the number of CPU cores on the current host and uses half of the available cores for parallel computing.

indSNPN

the total number of unlinked SNPs of the input GWASs, the minimal indSNPN is 1e4, e.g. for exon-wide association studies. The default value is 1e6, which is fine for typical GWASs. indSNPN will be automatically ceiling to nearest 500 in preprocess. This parameter is to specify the number of SNPs for simulation analysis to adjust for extra multiple testing induced by C-GWAS procedures.

simulDep

the number of simulations, the minimal simulDep is 50. simulDep will be automatically ceiling to nearest 50 in preprocess. The default value is 100.

pStudy

study-wide significant threshold after adjusting for extra multiple testing due to C-GWAS procedures, which should take values in (0,5e-6]. The default value is 0.05/indSNPN. When the indSNPN is default 1e6, the study-wide significance threshold of C-GWAS equals to 5e-8 and corresponds to the genome-wide significance threshold of traditional GWASs, meaning that C-GWAS results are directly comparable with any standard GWAS.

pSugst

study-wide suggestive significant threshold, which should take values in (0,1e-4]. The default value is 1/indSNPN. When replication samples are available, this relaxed threshold is recommended for the initial screening.

pMain

an internal cut-off threshold of i-EbICoW to filter SNPs with significant effect during iterations, range (0,3/1e4]. The default value is 3/indSNPN.

ebicoPwrInc

an internal threshold of i-EbICoW, requesting that the number of significant signals from EbICoW is at least ebicoPwrInc times of that from the Wald test and/or that from the adjusted minimal P from individual GWAS, range (0, inf). This value is typically equal to or slightly greater than 1. The default value is 1.1.

minCorDiff

an internal threshold of i-EbICoW, specifying the minimal difference between the background correlation (Psi) and the effect correlation (Pi) for combining two GWASs in i-EbICoW, range [0, 2). The default value is 0.05.

hiCorRestc

an internal threshold of i-EbICoW to avoid abnormal values when solving large Psi in the presence of strong collinearity, range (0,1). The default value is 0.5. GWASs with psi > hiCorRestc in i-EbICoW iterations are forced combined.

sampleNInc

an internal threshold of i-EbICoW, requesting that the effective sample size (Ess) of EbICoW after combination should be larger than the weighted sum of Ess of the two GWASs before combination. Typically, 0 <= sampleNInc <= 1, default is set to 0.5.

twtStCo

a vector of internal thresholds of TWT, a vector of TWT stratification cutoffs, range (0,1). The default thresholds are 10^(seq(0,1-ceiling(log10(indSNPN)),-1/3))[-1].

LoInrP

a vector of internal interval cut-offs for fitting the LOESS models, which should be in the descending order in the range [10/indSNPN,0.1]. The default cut-offs are c(0.05,0.001).

LoSpanV

a vector of internal parameters for smoothing the LOESS curves in the intervals specified by LoInrP, range (0,1). The default values are c(0.03,0.1,0.75). Note that the length of LoSpanV = length(LoInrP)+1.

lociInr

an integer specifying the physical distance in base pair between two SNPs that should be considered as in distinct loci, value range [1e5,1e6]. The default value is 2.5e5, i.e., 250 kbp.

Details

C-GWAS is a powerful method for combining GWAS summary statistics of multiple potentially related traits and detect SNPs with multi-trait effects. C-GWAS begins with GWASs summary as inputs and outputs a single vector of combined p-values testing if the null is deviated. For each SNP, the null is the absence of any effect on all traits, and the alternative is that its effect deviates from 0 for at least one trait. C-GWAS integrates two different statistical methods with complementary statistical features to ensure the optimal power under various and complex scenarios while keeping a stable study-wide type-I error rate. The first method is called iterative effect based inverse covariance weighting (i-EbICoW) and the second method is called truncated Wald test (TWT). C-GWAS controls the study-wide type-I error rate in an empirical manner via simulations and adjust the resultant p-values in such a way that they are directly comparable with those from traditional GWAS of a single trait.

The final and intermediate results of C-GWAS analysis are saved in the output directory outputPath. Two new folders will be created in this directory: Results/ and Details/.

Main Results

The Results/ folder contains main results including:

CGWAS-GWAS.jpg

Combined Manhattan plots displays adjusted C-GWAS p-value (upper parts) and adjusted MinGWAS p-value (lower parts) of all SNPs at the –log10 scale. The study-wide significance is indicated using dashed lines and the suggestive significance is indicated using solid lines. Loci with the lead SNP passing the study-wide significant threshold on both sides are highlighted using different colors and shapes.

CGWASminpQQ.jpg

Adjusted p-values at the –log10 scale (y-axis) of MinGWAS (blue) and C-GWAS (orange) for all SNPs are plotted against their expected uniformly distributed quantiles at the –log10 scale (x-axis) under the null.

SummarySugSigSNP.csv

This table contains all SNP passing the suggestive significance threshold in either C-GWAS or MinGWAS. For each SNP, its position (CHR and BP), rsID (SNP), adjusted C-GWAS p-value (C-GWAS-P) and MinGWAS p-value (MinGWAS-P) are provided. Mean reference allele frequency (MRAF) will be provided if user define available frequency data path mrafFilePath. Distinct loci are distinguished by loci index. NA indicates that the SNP did not pass the suggestive significance threshold.

C-GWAS.p

This table contains position (CHR and BP), rsID (SNP), adjusted C-GWAS p-value (C-GWASadjustedP), unadjusted C-GWAS p-value (C-GWASrawP), adjusted MinGWAS p-value (MinGWASadjustedP), unadjusted MinGWAS p-value (MinGWASrawP) of all SNPs. Mean reference allele frequency (MRAF) will be also included if user provided the reference allele frequencies in mrafFilePath.

Intermediate Results

The Details/ folder contains all intermediate results including

SummaryGetI.txt

This table contains mean chi-square (square of test statistics) of all SNPs before adjustment (RawMeanX2), Genomic control value (GClambda), getI result (EstInf) and mean chi-square of all SNPs after getI based adjustment (AdjMeanX2) for each GWAS (traitName).

SummaryGetPsi.txt

This table contains Pearson correlation coefficient of test statistics from all SNPs (StatCor), getPsi result (Psi) and getPi result (considering all SNPs, allPi; considering only significant SNPs, sigPi) for all GWAS pairs (GWAS1 and GWAS2).

EbICoW.txt

This table contains effect of EbICoW GWAS (define as mean chi-square of all SNPs minus one, AllEff), Equivalent sample size (Ess), EbICoW GWAS combination members (GWAS*), corresponding weight of effect size in combination (bw*), corresponding weight of test statistics in combination (tw*) and rounds of all iterations involved in combine (r*) for each EbICoW GWAS (traitName).

SummaryEbICoWGetPsi.txt

This table contains Pearson correlation coefficient of test statistics from all SNPs (StatCor), getPsi result (Psi) and getPi result (considering all SNPs, allPi; considering only significant SNPs, sigPi) for all EbICoW GWAS pairs (GWAS1 and GWAS2).

Summaryi-EbICoW.txt

This table contains the details of each i-EbICoW iteration round. 1) Iteration round count (Round); 2) Two GWASs used in the iteration (GWAS1 and GWAS2); 3) Name of combined GWAS (NewGWAS); 4) Type of Pi and h chosen by optimize function (Piall and hall, AEC; Pisig and hsig, SEC; Pistb and hstb, STB); 5) Whether pass evaluation of evaluate function (Evaluate); 6) Effect of GWAS1, GWAS2 and NewGWAS (define as mean chi-square of all SNPs minus one, Eff1X2, Eff2X2 and NewEffX2); 7) getPsi and getPi result (Psi; considering all SNPs, allPi; considering only significant SNPs, sigPi); 8) Significant SNP number using parameter threshold pMain, including in GWAS1 (sigSNP1N), GWAS2 (sigSNP2N), MinGWAS of GWAS1 and GWAS2 (sigSNPminN), combined GWAS using AEC (sigSNPAECN), combined GWAS using SEC (sigSNPSECN), combined GWAS using STB (sigSNPSTBN), combined GWAS using Wald test (sigSNPWDN), intersection of AEC combined GWAS and GWAS1-GWAS2 union (sigSNPAECuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 union (sigSNPSECuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 union (sigSNPSTBuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 intersection (sigSNPAECinterN), intersection of AEC combined GWAS and GWAS1-GWAS2 intersection (sigSNPSECinterN), intersection of AEC combined GWAS and GWAS1-GWAS2 intersection (sigSNPSTBinterN); 9) Equivalent sample size of GWAS1, GWAS2 and NewGWAS (Ess1, Ess2 and EssNew); 10) Weight of effect size used in combining GWAS1 and GWAS2 chosen by optimize function (bw1 and bw2); 11) Weight of test statistics used in combining GWAS1 and GWAS2 chosen by optimize function (tw1 and tw2).

NullCGWAScorrection.txt

This dataset contains LOESS model samples (ExpectedQuantile and ObservedP) simulated in getCoef function using in adjusting C-GWAS result.

NullCGWASdistribution.jpg

Simulated null distributions of the unadjusted p-values (gray) and p-values corrected using the getCoef function (orange) are compared with the uniform distribution (black) for C-GWAS simulation.

NullMinpcorrection.txt

This dataset contains LOESS model samples (ExpectedQuantile and ObservedP) simulated in getCoef function using in adjusting MinGWAS result.

NullMinpdistribution.jpg

Simulated null distributions of the unadjusted p-values (gray) and p-values corrected using the getCoef function (orange) are compared with the uniform distribution (black) for MinGWAS simulation.

Details/i-EbICoW

If user choose to keep i-EbICoW output (keepIEb = TRUE), all summary statistics (effect sizes, *.beta and test statistics, *.stat) of EbICoW GWASs are saved in Details/i-EbICoW.

Future Development Plan

The current version is 0.9.3, which implementing the whole C-GWAS procedure into a single function cgwas. Standing along functions facilitating the C-GWAS analysis will be implemented in a later version, e.g., getI, getPsi, getPi.

See Also

For more instruction of running CGWAS, please see https://github.com/FanLiuLab/CGWAS

Examples

# example of the input GWAS file
f1 <- read.table(file.path(system.file("extdata", package = "CGWAS"),
'Y1.assoc'), header=TRUE)
head(f1)

# example of the input mean reference allele frequency file
f2 <- read.table(file.path(system.file("extdata", package = "CGWAS"),
'MRAF'), header=TRUE)
head(f2)

# example of the input SNP information file
f3 <- read.table(file.path(system.file("extdata", package = "CGWAS"),
'SnpInfo'), header=TRUE)
head(f3)

# demo of whole C-GWAS procedure implementing
# the output files are in the current working directory
ExDataDir <- system.file("extdata", package = "CGWAS")
gwasFileName <- c("Y1.assoc", "Y2.assoc", "Y3.assoc",
                  "Y4.assoc", "Y5.assoc", "Y6.assoc",
                  "Y7.assoc", "Y8.assoc", "Y9.assoc",
                  "Y10.assoc", "Y11.assoc", "Y12.assoc")
gwasFilePath <- file.path(ExDataDir, gwasFileName)
snpFilePath <- file.path(ExDataDir, 'SnpInfo')
outputPath <- getwd()
traitName <- c("Y1", "Y2", "Y3",
               "Y4", "Y5", "Y6",
               "Y7", "Y8", "Y9",
               "Y10", "Y11L", "Y12")
mrafFilePath <- file.path(ExDataDir, 'MRAF')
indSNPN = 1e5

cgwas(gwasFilePath, snpFilePath, outputPath,
      traitName = traitName, mrafFilePath = mrafFilePath, indSNPN = indSNPN)


TianTTL/CGWAS documentation built on April 25, 2022, 1:09 a.m.