View source: R/cgwasProtocal.R
cgwas | R Documentation |
Combining GWAS summary statistics of multiple potentially related traits.
cgwas( gwasFilePath, snpFilePath, outputPath, traitName = basename(gwasFilePath), exNa = TRUE, mrafFilePath = NULL, keepIEb = FALSE, threadN = ceiling(parallel::detectCores()/2), indSNPN = 1e+06, simulDep = 100, pStudy = 0.05/indSNPN, pSugst = 1/indSNPN, pMain = 3/indSNPN, ebicoPwrInc = 1.1, minCorDiff = 0.05, hiCorRestc = 0.5, sampleNInc = 0.5, twtStCo = 10^(seq(0, 1 - ceiling(log10(indSNPN)), -1/3))[-1], LoInrP = c(0.05, 0.001), LoSpanV = c(0.03, 0.1, 0.75), lociInr = 250000 )
gwasFilePath |
a vector of strings containing the paths to input GWASs summary statistics files. These files should be in the space or tab delimited format. Each file contains two and only two columns. The first column is the regression betas of all SNPs and the second column is the P-values of all SNPs. These files require a header line with two items: BETA and P. Note that for each SNP, all betas must be based on the same reference allele. |
snpFilePath |
string indicating the path to input SNP information file, a space delimited file consisting of three columns representing chromosome, base pair and SNP identifier. The file requires a header line with three items CHR, BP and SNP in the specified order. |
outputPath |
a string indicating the path to output directory. In this
directory, two new folders will be created, the |
traitName |
a vector of strings of trait names. The order must be the same as input GWASs. The basename of GWAS files will be set to trait names by default. |
exNa |
logical. If |
mrafFilePath |
a string indicating the path to the mean reference allele
frequency (MRAF) file, which requires a header MRAF, and contains one column
of mean frequency of the reference allele of each SNP. Note that the
reference allele must correspond to the beta in the input GWAS. Reference
allele frequencies are used to estimate the weights of input GWASs and all
intermediate GWASs. This MRAF file is recommended but not obligatory with
default |
keepIEb |
logical. If |
threadN |
number of threads/cores to be used for parallel computing. The default is to automatically detect the number of CPU cores on the current host and uses half of the available cores for parallel computing. |
indSNPN |
the total number of unlinked SNPs of the input GWASs, the
minimal |
simulDep |
the number of simulations, the minimal |
pStudy |
study-wide significant threshold after adjusting for extra
multiple testing due to C-GWAS procedures, which should take values in
(0,5e-6]. The default value is |
pSugst |
study-wide suggestive significant threshold, which should take
values in (0,1e-4]. The default value is |
pMain |
an internal cut-off threshold of |
ebicoPwrInc |
an internal threshold of |
minCorDiff |
an internal threshold of |
hiCorRestc |
an internal threshold of |
sampleNInc |
an internal threshold of |
twtStCo |
a vector of internal thresholds of |
LoInrP |
a vector of internal interval cut-offs for fitting the LOESS
models, which should be in the descending order in the range
|
LoSpanV |
a vector of internal parameters for smoothing the LOESS curves
in the intervals specified by |
lociInr |
an integer specifying the physical distance in base pair
between two SNPs that should be considered as in distinct loci, value range
[1e5,1e6]. The default value is 2.5e5, i.e., |
C-GWAS is a powerful method for combining GWAS summary statistics of multiple
potentially related traits and detect SNPs with multi-trait effects. C-GWAS
begins with GWASs summary as inputs and outputs a single vector of combined
p-values testing if the null is deviated. For each SNP, the null is the
absence of any effect on all traits, and the alternative is that its effect
deviates from 0 for at least one trait. C-GWAS integrates two different
statistical methods with complementary statistical features to ensure the
optimal power under various and complex scenarios while keeping a stable
study-wide type-I error rate. The first method is called iterative effect
based inverse covariance weighting (i-EbICoW
) and the second method
is called truncated Wald test (TWT
). C-GWAS controls the study-wide
type-I error rate in an empirical manner via simulations and adjust the
resultant p-values in such a way that they are directly comparable with those
from traditional GWAS of a single trait.
The final and intermediate results of C-GWAS analysis are saved in the output
directory outputPath
. Two new folders will be created in this
directory: Results/
and Details/
.
The Results/
folder contains main results including:
CGWAS-GWAS.jpg
Combined Manhattan plots displays adjusted C-GWAS p-value (upper parts) and adjusted MinGWAS p-value (lower parts) of all SNPs at the –log10 scale. The study-wide significance is indicated using dashed lines and the suggestive significance is indicated using solid lines. Loci with the lead SNP passing the study-wide significant threshold on both sides are highlighted using different colors and shapes.
CGWASminpQQ.jpg
Adjusted p-values at the –log10 scale (y-axis) of MinGWAS (blue) and C-GWAS (orange) for all SNPs are plotted against their expected uniformly distributed quantiles at the –log10 scale (x-axis) under the null.
SummarySugSigSNP.csv
This table contains all SNP passing the
suggestive significance threshold in either C-GWAS or MinGWAS. For each SNP,
its position (CHR and BP), rsID (SNP), adjusted C-GWAS p-value (C-GWAS-P) and
MinGWAS p-value (MinGWAS-P) are provided. Mean reference allele frequency
(MRAF) will be provided if user define available frequency data path
mrafFilePath
. Distinct loci are distinguished by loci index. NA
indicates that the SNP did not pass the suggestive significance threshold.
C-GWAS.p
This table contains position (CHR and BP), rsID (SNP),
adjusted C-GWAS p-value (C-GWASadjustedP), unadjusted C-GWAS p-value
(C-GWASrawP), adjusted MinGWAS p-value (MinGWASadjustedP), unadjusted MinGWAS
p-value (MinGWASrawP) of all SNPs. Mean reference allele frequency (MRAF)
will be also included if user provided the reference allele frequencies in
mrafFilePath
.
The Details/
folder contains all intermediate results including
SummaryGetI.txt
This table contains mean chi-square (square of
test statistics) of all SNPs before adjustment (RawMeanX2), Genomic control
value (GClambda), getI
result (EstInf) and mean chi-square of all SNPs
after getI
based adjustment (AdjMeanX2) for each GWAS
(traitName
).
SummaryGetPsi.txt
This table contains Pearson correlation
coefficient of test statistics from all SNPs (StatCor), getPsi
result
(Psi
) and getPi
result (considering all SNPs, allPi; considering only
significant SNPs, sigPi) for all GWAS pairs (GWAS1 and GWAS2).
EbICoW.txt
This table contains effect of EbICoW GWAS (define as
mean chi-square of all SNPs minus one, AllEff), Equivalent sample size (Ess),
EbICoW GWAS combination members (GWAS*), corresponding weight of effect size
in combination (bw*), corresponding weight of test statistics in combination
(tw*) and rounds of all iterations involved in combine (r*) for each EbICoW
GWAS (traitName
).
SummaryEbICoWGetPsi.txt
This table contains Pearson correlation
coefficient of test statistics from all SNPs (StatCor), getPsi
result
(Psi
) and getPi
result (considering all SNPs, allPi; considering only
significant SNPs, sigPi) for all EbICoW GWAS pairs (GWAS1 and GWAS2).
Summaryi-EbICoW.txt
This table contains the details of each
i-EbICoW
iteration round.
1) Iteration round count (Round);
2) Two GWASs used in the iteration (GWAS1 and GWAS2);
3) Name of combined GWAS (NewGWAS);
4) Type of Pi
and h chosen by optimize function (Piall and hall, AEC; Pisig
and hsig, SEC; Pistb and hstb, STB);
5) Whether pass evaluation of evaluate function (Evaluate);
6) Effect of GWAS1, GWAS2 and NewGWAS (define as mean chi-square of all SNPs
minus one, Eff1X2, Eff2X2 and NewEffX2);
7) getPsi
and getPi
result (Psi
; considering all SNPs, allPi;
considering only significant SNPs, sigPi);
8) Significant SNP number using parameter threshold pMain
, including
in GWAS1 (sigSNP1N), GWAS2 (sigSNP2N), MinGWAS of GWAS1 and GWAS2
(sigSNPminN), combined GWAS using AEC (sigSNPAECN), combined GWAS using SEC
(sigSNPSECN), combined GWAS using STB (sigSNPSTBN), combined GWAS using Wald
test (sigSNPWDN), intersection of AEC combined GWAS and GWAS1-GWAS2 union
(sigSNPAECuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 union
(sigSNPSECuniN), intersection of AEC combined GWAS and GWAS1-GWAS2 union
(sigSNPSTBuniN), intersection of AEC combined GWAS and GWAS1-GWAS2
intersection (sigSNPAECinterN), intersection of AEC combined GWAS and
GWAS1-GWAS2 intersection (sigSNPSECinterN), intersection of AEC combined GWAS
and GWAS1-GWAS2 intersection (sigSNPSTBinterN);
9) Equivalent sample size of GWAS1, GWAS2 and NewGWAS (Ess1, Ess2 and EssNew);
10) Weight of effect size used in combining GWAS1 and GWAS2 chosen by
optimize function (bw1 and bw2);
11) Weight of test statistics used in combining GWAS1 and GWAS2 chosen by
optimize function (tw1 and tw2).
NullCGWAScorrection.txt
This dataset contains LOESS model
samples (ExpectedQuantile and ObservedP) simulated in getCoef
function
using in adjusting C-GWAS result.
NullCGWASdistribution.jpg
Simulated null distributions of the
unadjusted p-values (gray) and p-values corrected using the getCoef
function (orange) are compared with the uniform distribution (black) for
C-GWAS simulation.
NullMinpcorrection.txt
This dataset contains LOESS model
samples (ExpectedQuantile and ObservedP) simulated in getCoef
function
using in adjusting MinGWAS result.
NullMinpdistribution.jpg
Simulated null distributions of the
unadjusted p-values (gray) and p-values corrected using the getCoef
function (orange) are compared with the uniform distribution (black) for
MinGWAS simulation.
Details/i-EbICoW
If user choose to keep i-EbICoW
output
(keepIEb = TRUE
), all summary statistics (effect sizes, *.beta and
test statistics, *.stat) of EbICoW GWASs are saved in Details/i-EbICoW
.
The current version is 0.9.3, which implementing the whole C-GWAS procedure
into a single function cgwas
. Standing along functions facilitating
the C-GWAS analysis will be implemented in a later version, e.g., getI
,
getPsi
, getPi
.
For more instruction of running CGWAS, please see https://github.com/FanLiuLab/CGWAS
# example of the input GWAS file f1 <- read.table(file.path(system.file("extdata", package = "CGWAS"), 'Y1.assoc'), header=TRUE) head(f1) # example of the input mean reference allele frequency file f2 <- read.table(file.path(system.file("extdata", package = "CGWAS"), 'MRAF'), header=TRUE) head(f2) # example of the input SNP information file f3 <- read.table(file.path(system.file("extdata", package = "CGWAS"), 'SnpInfo'), header=TRUE) head(f3) # demo of whole C-GWAS procedure implementing # the output files are in the current working directory ExDataDir <- system.file("extdata", package = "CGWAS") gwasFileName <- c("Y1.assoc", "Y2.assoc", "Y3.assoc", "Y4.assoc", "Y5.assoc", "Y6.assoc", "Y7.assoc", "Y8.assoc", "Y9.assoc", "Y10.assoc", "Y11.assoc", "Y12.assoc") gwasFilePath <- file.path(ExDataDir, gwasFileName) snpFilePath <- file.path(ExDataDir, 'SnpInfo') outputPath <- getwd() traitName <- c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6", "Y7", "Y8", "Y9", "Y10", "Y11L", "Y12") mrafFilePath <- file.path(ExDataDir, 'MRAF') indSNPN = 1e5 cgwas(gwasFilePath, snpFilePath, outputPath, traitName = traitName, mrafFilePath = mrafFilePath, indSNPN = indSNPN)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.