cgwas: combine GWAS (C-GWAS)
In TianTTL/CGWAS: A toolkits for combine GWAS (C-GWAS)

cgwas

R Documentation

combine GWAS (C-GWAS)

Description

Combining GWAS summary statistics of multiple potentially related traits.

Usage

cgwas(
  gwasFilePath,
  snpFilePath,
  outputPath,
  traitName = basename(gwasFilePath),
  exNa = TRUE,
  mrafFilePath = NULL,
  keepIEb = FALSE,
  threadN = ceiling(parallel::detectCores()/2),
  indSNPN = 1e+06,
  simulDep = 100,
  pStudy = 0.05/indSNPN,
  pSugst = 1/indSNPN,
  pMain = 3/indSNPN,
  ebicoPwrInc = 1.1,
  minCorDiff = 0.05,
  hiCorRestc = 0.5,
  sampleNInc = 0.5,
  twtStCo = 10^(seq(0, 1 - ceiling(log10(indSNPN)), -1/3))[-1],
  LoInrP = c(0.05, 0.001),
  LoSpanV = c(0.03, 0.1, 0.75),
  lociInr = 250000
)

Arguments

`gwasFilePath`	a vector of strings containing the paths to input GWASs summary statistics files. These files should be in the space or tab delimited format. Each file contains two and only two columns. The first column is the regression betas of all SNPs and the second column is the P-values of all SNPs. These files require a header line with two items: BETA and P. Note that for each SNP, all betas must be based on the same reference allele.
`snpFilePath`	string indicating the path to input SNP information file, a space delimited file consisting of three columns representing chromosome, base pair and SNP identifier. The file requires a header line with three items CHR, BP and SNP in the specified order.
`outputPath`	a string indicating the path to output directory. In this directory, two new folders will be created, the `Results/` contains main results and the `Details/` contains all intermediate results.
`traitName`	a vector of strings of trait names. The order must be the same as input GWASs. The basename of GWAS files will be set to trait names by default.
`exNa`	logical. If `TRUE`, SNPs with missing values in at least one input GWAS will be removed and will not appear in the output files. All removed SNPs will be listed in a file named `ExcludedSNP.txt`; if `FALSE`, the NA is replaced with BETA=0 and P=1. Default is `TRUE`.
`mrafFilePath`	a string indicating the path to the mean reference allele frequency (MRAF) file, which requires a header MRAF, and contains one column of mean frequency of the reference allele of each SNP. Note that the reference allele must correspond to the beta in the input GWAS. Reference allele frequencies are used to estimate the weights of input GWASs and all intermediate GWASs. This MRAF file is recommended but not obligatory with default `NULL`. If `NULL`, weights will still be estimated by approximations.
`keepIEb`	logical. If `TRUE`, all intermediate results from `i-EbICoW` will be saved in `Details/i-EbICoW`. Default is `FALSE`.
`threadN`	number of threads/cores to be used for parallel computing. The default is to automatically detect the number of CPU cores on the current host and uses half of the available cores for parallel computing.
`indSNPN`	the total number of unlinked SNPs of the input GWASs, the minimal `indSNPN` is 1e4, e.g. for exon-wide association studies. The default value is 1e6, which is fine for typical GWASs. `indSNPN` will be automatically ceiling to nearest 500 in preprocess. This parameter is to specify the number of SNPs for simulation analysis to adjust for extra multiple testing induced by C-GWAS procedures.
`simulDep`	the number of simulations, the minimal `simulDep` is 50. `simulDep` will be automatically ceiling to nearest 50 in preprocess. The default value is 100.
`pStudy`	study-wide significant threshold after adjusting for extra multiple testing due to C-GWAS procedures, which should take values in (0,5e-6]. The default value is `0.05/indSNPN`. When the `indSNPN` is default 1e6, the study-wide significance threshold of C-GWAS equals to 5e-8 and corresponds to the genome-wide significance threshold of traditional GWASs, meaning that C-GWAS results are directly comparable with any standard GWAS.
`pSugst`	study-wide suggestive significant threshold, which should take values in (0,1e-4]. The default value is `1/indSNPN`. When replication samples are available, this relaxed threshold is recommended for the initial screening.
`pMain`	an internal cut-off threshold of `i-EbICoW` to filter SNPs with significant effect during iterations, range (0,3/1e4]. The default value is `3/indSNPN`.
`ebicoPwrInc`	an internal threshold of `i-EbICoW`, requesting that the number of significant signals from EbICoW is at least `ebicoPwrInc` times of that from the Wald test and/or that from the adjusted minimal P from individual GWAS, range (0, inf). This value is typically equal to or slightly greater than 1. The default value is 1.1.
`minCorDiff`	an internal threshold of `i-EbICoW`, specifying the minimal difference between the background correlation (`Psi`) and the effect correlation (`Pi`) for combining two GWASs in `i-EbICoW`, range [0, 2). The default value is 0.05.
`hiCorRestc`	an internal threshold of `i-EbICoW` to avoid abnormal values when solving large `Psi` in the presence of strong collinearity, range (0,1). The default value is 0.5. GWASs with `psi > hiCorRestc` in `i-EbICoW` iterations are forced combined.
`sampleNInc`	an internal threshold of `i-EbICoW`, requesting that the effective sample size (Ess) of EbICoW after combination should be larger than the weighted sum of Ess of the two GWASs before combination. Typically, `0 <= sampleNInc <= 1`, default is set to 0.5.
`twtStCo`	a vector of internal thresholds of `TWT`, a vector of `TWT` stratification cutoffs, range (0,1). The default thresholds are `10^(seq(0,1-ceiling(log10(indSNPN)),-1/3))[-1]`.
`LoInrP`	a vector of internal interval cut-offs for fitting the LOESS models, which should be in the descending order in the range `[10/indSNPN,0.1]`. The default cut-offs are `c(0.05,0.001)`.
`LoSpanV`	a vector of internal parameters for smoothing the LOESS curves in the intervals specified by `LoInrP`, range (0,1). The default values are `c(0.03,0.1,0.75)`. Note that the length of `LoSpanV = length(LoInrP)+1`.
`lociInr`	an integer specifying the physical distance in base pair between two SNPs that should be considered as in distinct loci, value range [1e5,1e6]. The default value is 2.5e5, i.e., `250 kbp`.

Details

C-GWAS is a powerful method for combining GWAS summary statistics of multiple potentially related traits and detect SNPs with multi-trait effects. C-GWAS begins with GWASs summary as inputs and outputs a single vector of combined p-values testing if the null is deviated. For each SNP, the null is the absence of any effect on all traits, and the alternative is that its effect deviates from 0 for at least one trait. C-GWAS integrates two different statistical methods with complementary statistical features to ensure the optimal power under various and complex scenarios while keeping a stable study-wide type-I error rate. The first method is called iterative effect based inverse covariance weighting (i-EbICoW) and the second method is called truncated Wald test (TWT). C-GWAS controls the study-wide type-I error rate in an empirical manner via simulations and adjust the resultant p-values in such a way that they are directly comparable with those from traditional GWAS of a single trait.

The final and intermediate results of C-GWAS analysis are saved in the output directory outputPath. Two new folders will be created in this directory: Results/ and Details/.

Main Results

The Results/ folder contains main results including:

CGWAS-GWAS.jpg: Combined Manhattan plots displays adjusted C-GWAS p-value (upper parts) and adjusted MinGWAS p-value (lower parts) of all SNPs at the –log10 scale. The study-wide significance is indicated using dashed lines and the suggestive significance is indicated using solid lines. Loci with the lead SNP passing the study-wide significant threshold on both sides are highlighted using different colors and shapes.
CGWASminpQQ.jpg: Adjusted p-values at the –log10 scale (y-axis) of MinGWAS (blue) and C-GWAS (orange) for all SNPs are plotted against their expected uniformly distributed quantiles at the –log10 scale (x-axis) under the null.
SummarySugSigSNP.csv: This table contains all SNP passing the suggestive significance threshold in either C-GWAS or MinGWAS. For each SNP, its position (CHR and BP), rsID (SNP), adjusted C-GWAS p-value (C-GWAS-P) and MinGWAS p-value (MinGWAS-P) are provided. Mean reference allele frequency (MRAF) will be provided if user define available frequency data path mrafFilePath. Distinct loci are distinguished by loci index. NA indicates that the SNP did not pass the suggestive significance threshold.
C-GWAS.p: This table contains position (CHR and BP), rsID (SNP), adjusted C-GWAS p-value (C-GWASadjustedP), unadjusted C-GWAS p-value (C-GWASrawP), adjusted MinGWAS p-value (MinGWASadjustedP), unadjusted MinGWAS p-value (MinGWASrawP) of all SNPs. Mean reference allele frequency (MRAF) will be also included if user provided the reference allele frequencies in mrafFilePath.

Intermediate Results

The Details/ folder contains all intermediate results including

SummaryGetI.txt: This table contains mean chi-square (square of test statistics) of all SNPs before adjustment (RawMeanX2), Genomic control value (GClambda), getI result (EstInf) and mean chi-square of all SNPs after getI based adjustment (AdjMeanX2) for each GWAS (traitName).
SummaryGetPsi.txt: This table contains Pearson correlation coefficient of test statistics from all SNPs (StatCor), getPsi result (Psi) and getPi result (considering all SNPs, allPi; considering only significant SNPs, sigPi) for all GWAS pairs (GWAS1 and GWAS2).
EbICoW.txt: This table contains effect of EbICoW GWAS (define as mean chi-square of all SNPs minus one, AllEff), Equivalent sample size (Ess), EbICoW GWAS combination members (GWAS*), corresponding weight of effect size in combination (bw*), corresponding weight of test statistics in combination (tw*) and rounds of all iterations involved in combine (r*) for each EbICoW GWAS (traitName).
SummaryEbICoWGetPsi.txt: This table contains Pearson correlation coefficient of test statistics from all SNPs (StatCor), getPsi result (Psi) and getPi result (considering all SNPs, allPi; considering only significant SNPs, sigPi) for all EbICoW GWAS pairs (GWAS1 and GWAS2).
Summaryi-EbICoW.txt: This table contains the details of each i-EbICoW iteration round. 1) Iteration round count (Round); 2) Two GWASs used in the iteration (GWAS1 and GWAS2); 3) Name of combined GWAS (NewGWAS); 4) Type of Pi and h chosen by optimize function (Piall and hall, AEC; Pisig and hsig, SEC; Pistb and hstb, STB); 5) Whether pass evaluation of evaluate function (Evaluate); 6) Effect of GWAS1, GWAS2 and NewGWAS (define as mean chi-square of all SNPs minus one, Eff1X2, Eff2X2 and NewEffX2); 7) getPsi and getPi result (Psi; considering all SNPs, allPi; considering only significant SNPs, sigPi); 8) Significant SNP number using parameter threshold pMain, including in GWAS1 (sigSNP1N), GWAS2 (sigSNP2N), MinGWAS of GWAS1 and GWAS2 (sigSNPminN), combined GWAS using AEC (sigSNPAECN), combined GWAS using SEC (sigSNPSECN), combined GWAS using STB (sigSNPSTBN), combined GWAS using Wald test (sigSNPWDN), intersection of AEC combined GWAS and GWAS1-GWAS2 union (sigSNPAECuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 union (sigSNPSECuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 union (sigSNPSTBuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 intersection (sigSNPAECinterN), intersection of AEC combined GWAS and GWAS1-GWAS2 intersection (sigSNPSECinterN), intersection of AEC combined GWAS and GWAS1-GWAS2 intersection (sigSNPSTBinterN); 9) Equivalent sample size of GWAS1, GWAS2 and NewGWAS (Ess1, Ess2 and EssNew); 10) Weight of effect size used in combining GWAS1 and GWAS2 chosen by optimize function (bw1 and bw2); 11) Weight of test statistics used in combining GWAS1 and GWAS2 chosen by optimize function (tw1 and tw2).
NullCGWAScorrection.txt: This dataset contains LOESS model samples (ExpectedQuantile and ObservedP) simulated in getCoef function using in adjusting C-GWAS result.
NullCGWASdistribution.jpg: Simulated null distributions of the unadjusted p-values (gray) and p-values corrected using the getCoef function (orange) are compared with the uniform distribution (black) for C-GWAS simulation.
NullMinpcorrection.txt: This dataset contains LOESS model samples (ExpectedQuantile and ObservedP) simulated in getCoef function using in adjusting MinGWAS result.
NullMinpdistribution.jpg: Simulated null distributions of the unadjusted p-values (gray) and p-values corrected using the getCoef function (orange) are compared with the uniform distribution (black) for MinGWAS simulation.
Details/i-EbICoW: If user choose to keep i-EbICoW output (keepIEb = TRUE), all summary statistics (effect sizes, *.beta and test statistics, *.stat) of EbICoW GWASs are saved in Details/i-EbICoW.

Future Development Plan

The current version is 0.9.3, which implementing the whole C-GWAS procedure into a single function cgwas. Standing along functions facilitating the C-GWAS analysis will be implemented in a later version, e.g., getI, getPsi, getPi.

Examples

# example of the input GWAS file
f1 <- read.table(file.path(system.file("extdata", package = "CGWAS"),
'Y1.assoc'), header=TRUE)
head(f1)

# example of the input mean reference allele frequency file
f2 <- read.table(file.path(system.file("extdata", package = "CGWAS"),
'MRAF'), header=TRUE)
head(f2)

# example of the input SNP information file
f3 <- read.table(file.path(system.file("extdata", package = "CGWAS"),
'SnpInfo'), header=TRUE)
head(f3)

# demo of whole C-GWAS procedure implementing
# the output files are in the current working directory
ExDataDir <- system.file("extdata", package = "CGWAS")
gwasFileName <- c("Y1.assoc", "Y2.assoc", "Y3.assoc",
                  "Y4.assoc", "Y5.assoc", "Y6.assoc",
                  "Y7.assoc", "Y8.assoc", "Y9.assoc",
                  "Y10.assoc", "Y11.assoc", "Y12.assoc")
gwasFilePath <- file.path(ExDataDir, gwasFileName)
snpFilePath <- file.path(ExDataDir, 'SnpInfo')
outputPath <- getwd()
traitName <- c("Y1", "Y2", "Y3",
               "Y4", "Y5", "Y6",
               "Y7", "Y8", "Y9",
               "Y10", "Y11L", "Y12")
mrafFilePath <- file.path(ExDataDir, 'MRAF')
indSNPN = 1e5

cgwas(gwasFilePath, snpFilePath, outputPath,
      traitName = traitName, mrafFilePath = mrafFilePath, indSNPN = indSNPN)

TianTTL/CGWAS documentation built on April 25, 2022, 1:09 a.m.