Description Usage Arguments Details Value References See Also Examples
Calculate gene and pathway p-values using the ARTP test and summary data.
1 2 |
summary.files |
a character vector of file names containing the summary results of SNPs included in one or multiple studies.
Each file must be able to be read by |
pathway |
a character of the name of file containing definition of a pathway.
It must be able to be read by |
family |
a character taking values of |
reference |
a data.frame containing the paths of binary PLINK files of reference dataset.
It must have columns called |
lambda |
a numeric vector of inflation factors. Each file in |
ncases |
a list of numeric vectors specifying sample sizes of cases of participating studies. This argument should be specified only if |
ncontrols |
a list of numeric vectors specifying sample sizes of controls of participating studies. This argument should be specified only if |
nsamples |
a list of numeric vectors specifying total sample sizes of participating studies. This argument should be specified only if |
options |
a list of options to control the test procedure. If |
This function computes gene and pathway p-values when only summary data is available. Only the ARTP test modified from Yu et al. (2009) is well tested and is released with this package. ARTP is the Adaptive Rank Truncated Product test.
Each file in summary.files
must contain
SNP
SNP name
RefAllele
reference allele. Can be different in studies
EffectAllele
effect allele. Can be different in studies
Beta
estimated effect in linear regression model or log odds ratio in logistic regression model
and must contain one of the optional columns
SE
estimated standard error of Beta
P
p-value of Wald's, LRT or score test for testing H_0: Beta = 0
. Can be generated by lm
, glm
, anova
in R
or other standard statistical softwares.
An optional column Direction
is encouraged to be provided by the user
Direction
a character vector indicating which studies include a SNP. Any symbol except for '?' means a SNP is included in that study. Please note that the real direction of a SNP in studies ('+' or '-') does not matter, e.g., '++-?+' and '**+?-' provide exact the same information. See Examples
.
Another two optional columns Chr
and Pos
are needed if excluded.regions
is specified in options
. ARTP2
will convert the column names to be upper case, so for example, either Beta
or BETA
or beta
are accepted. See Examples
.
Chr
chromosome.
Pos
base-pair position (bp units).
If the option ambig.by.AF
is set to 1, then the summary files must contain at least one of:
RAF
reference allele frequency.
EAF
effect allele frequency.
The order of columns in files summary.files
, pathway
or in data frame reference
are arbitrary,
and all unnecessary columns (if any) are discarded in the analysis.
sARTP
returns an object of class ARTP2
. It is a list containing the following components:
pathway.pvalue |
final pathway p-value accounting for multiple comparisons. |
gene.pvalue |
a data frame containing gene name, number of SNPs in the gene that were included in the analysis, chromosome name, and the p-value for the gene accounting for multiple comparisons. |
pathway |
a data frame defining the pathway that was actually tested after various filters applied. |
model |
a list containing detailed information of selected SNPs in each gene. |
most.sig.genes |
a character vector of genes selected by |
deleted.snps |
a data frame containing SNPs excluded from the analysis and their reasons. |
deleted.genes |
a data frame containing genes excluded from the analysis because they are subsets of other remaining genes. Set |
options |
a list of options used in the analysis. See |
accurate |
|
setup |
a list containing necessary input for |
Zhang H, Wheeler W, Hyland LP, Yang Y, Shi J, Chatterjee N, Yu K. (2016) A powerful procedure for pathway-based meta-analysis using summary statistics identifies 43 pathways associated with type II diabetes in European populations. PLoS Genetics 12(6): e1006122
Yu K, Li Q, Bergen AW, Pfeiffer RM, Rosenberg PS, Caporaso N, Kraft P, Chatterjee N. (2009) Pathway analysis by adaptive combination of P-values. Genet Epidemiol 33(8): 700 - 709
Zhang H, Shi J, Liang F, Wheeler W, Stolzenberg-Solomon R, Yu K. (2014) A fast multilocus test with adaptive SNP selection for large-scale genetic association studies. European Journal of Human Genetics 22: 696 - 702
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | library(ARTP2)
## Path of files containing summary statistics
## Only required columns will be loaded
study1 <- system.file("extdata", package = "ARTP2", "study1.txt.gz")
study2 <- system.file("extdata", package = "ARTP2", "study2.txt.gz")
## Path of a build-in file containing pathway definition
pathway <- system.file("extdata", package = "ARTP2", "pathway.txt.gz")
## Create data frame containing paths of build-in PLINK files that are going to used as reference
## As an example, use all chromosomes
chr <- 1:22
nchr <- length(chr)
fam <- vector("character", nchr)
bim <- vector("character", nchr)
bed <- vector("character", nchr)
for(i in 1:nchr){
fam[i] <- system.file("extdata", package = "ARTP2", paste("chr", chr[i], ".fam", sep = ""))
bim[i] <- system.file("extdata", package = "ARTP2", paste("chr", chr[i], ".bim", sep = ""))
bed[i] <- system.file("extdata", package = "ARTP2", paste("chr", chr[i], ".bed", sep = ""))
}
reference <- data.frame(fam, bim, bed, stringsAsFactors = FALSE)
## Set the options.
## Accumulate signal from the top 2 SNPs in each gene
## 1e5 replicates of resampling to estimate the p-value
options <- list(inspect.snp.n = 2, nperm = 1e4,
maf = .01, HWE.p = 1e-6,
gene.R2 = .9,
id.str = "unique-pathway-id",
out.dir = getwd(), save.setup = FALSE)
## different inflation factors are adjusted in two studies
lambda <- c(1.10, 1.08)
## two summary files, so there are two elements in each of two lists ncases and ncontrols
## the first summary file includes data calculated from meta-analysis of two sub-studies,
## each with sample size 63390 (9580 cases and 53810 controls) and 5643 (2591 cases and
## 3052 controls)
## see a few rows in study1
# s <- read.table(study1, header = TRUE, as.is = TRUE, nrows = 10)
# s$Direction
## [1] "+?" "+?" "++" "+?" "+?" "+?" "+?" "+?" "+?" "++"
## sub-study1 has 9580 cases, and sub-study2 has 2591 cases
## sub-study1 has 53810 cases, and sub-study2 has 3052 cases
## '?' means a SNP is not included in that sub-study
## any other symbols means a SNP is included in that sub-study
ncases <- list()
ncontrols <- list()
ncases[[1]] <- c(9580, 2591)
ncontrols[[1]] <- c(53810, 3052)
## the second summary file includes data calculated from one sub-studies with sample size
## 61957 (7638 cases and 54319 controls)
ncases[[2]] <- 7638
ncontrols[[2]] <- 54319
# logistic regression is used in base model, thus ncases and ncontrols should be specified.
family <- 'binomial'
## pathway test with two study files
# ret <- sARTP(summary.files = c(study1, study2), pathway, family, reference, lambda,
# ncases, ncontrols, options = options)
# ret$pathway.pvalue
## [1] 0.04594541 # Mac OS
## [1] 0.05149485 # Linux with 1 thread
## [1] 0.03969603 # Linux with 32 threads
## Mac OS
# head(ret$gene.pvalue)
## Gene Chr N.SNP Pvalue
## 1 BDH2 4 10 0.000749925
## 2 UBE2D3 4 6 0.001849815
## 3 PBX2 6 22 0.003849615
## 4 PPP1R14D 15 9 0.003849615
## 5 MRPL10 17 18 0.011448855
## 6 SCYL1 11 3 0.019848015
## Linux with 1 thread
# head(ret$gene.pvalue)
## Gene Chr N.SNP Pvalue
## 1 BDH2 4 10 0.000949905
## 2 UBE2D3 4 6 0.001699830
## 3 PPP1R14D 15 9 0.003949605
## 4 PBX2 6 22 0.004299570
## 5 MRPL10 17 18 0.012448755
## 6 SCYL1 11 3 0.017148285
## Linux with 32 threads
# head(ret$gene.pvalue)
## Gene Chr N.SNP Pvalue
## 1 UBE2D3 4 6 0.000849915
## 2 BDH2 4 10 0.001049895
## 3 PPP1R14D 15 9 0.003949605
## 4 PBX2 6 22 0.004899510
## 5 MRPL10 17 18 0.012798720
## 6 SCYL1 11 3 0.015048495
## pathway test with each of two studies
# ret1 <- sARTP(summary.files = study1, pathway, family, reference, lambda[1],
# ncases[1], ncontrols[1], options = options)
# ret2 <- sARTP(summary.files = study2, pathway, family, reference, lambda[2],
# ncases[2], ncontrols[2], options = options)
# ret1$pathway.pvalue
## [1] 0.04279572 # Mac OS
## [1] 0.03519648 # Linux with 1 thread
## [1] 0.04644536 # Linux with 32 threads
# ret2$pathway.pvalue
## [1] 0.3092691 # Mac OS
## [1] 0.2870213 # Linux with 1 thread
## [1] 0.3010699 # Linux with 32 threads
##################################################
## The reference is passed as an individual-level genotype data frame
data(ref.geno)
# ret.ref <- sARTP(summary.files = c(study1, study2), pathway, family, ref.geno, lambda,
# ncases, ncontrols, options = options)
# ret.ref$pathway.pvalue == ret$pathway.pvalue
##################################################
## The reference genotype data can also be merged into a single set of PLINK files
bed <- system.file("extdata", package = "ARTP2", "ref.bed")
bim <- system.file("extdata", package = "ARTP2", "ref.bim")
fam <- system.file("extdata", package = "ARTP2", "ref.fam")
reference <- data.frame(fam, bim, bed)
# ret.comb <- sARTP(summary.files = c(study1, study2), pathway, family, reference, lambda,
# ncases, ncontrols, options = options)
# ret.comb$pathway.pvalue == ret$pathway.pvalue
################
## exclude some regions
exc.reg1 <- data.frame(Chr = c(1, 1, 22),
Pos = c(1706160, 11979231, 51052379),
Radius = c(5000, 0, 2000))
options$excluded.regions <- exc.reg1
# ret.exc1 <- sARTP(summary.files = c(study1, study2), pathway, family, reference, lambda,
# ncases, ncontrols, options = options)
# ret.exc1$pathway.pvalue
## [1] 0.04619538 # Mac OS
## [1] 0.0510449 # Linux with 1 thread
## [1] 0.04054595 # Linux with 32 threads
# sum(ret.exc1$deleted.snps$reason == 'RM_BY_REGIONS')
## or equivalently
exc.reg2 <- data.frame(Chr = c(1, 1, 22),
Start = c(1701160, 11979231, 51050379),
End = c(1711160, 11979231, 51054379))
options$excluded.regions <- exc.reg2
# ret.exc2 <- sARTP(summary.files = c(study1, study2), pathway, family, reference, lambda,
# ncases, ncontrols, options = options)
# ret.exc1$pathway.pvalue == ret.exc2$pathway.pvalue
#################
## select a subset of subjects in plink files as the reference
## options$selected.subs should be in the same format as the first column of fam file
## load character vector subj.id of 400 subjects from build-in dataset
data(subj.id, package = "ARTP2")
head(subj.id)
options$selected.subs <- subj.id
options$excluded.regions <- NULL
# ret.sel <- sARTP(summary.files = c(study1, study2), pathway, family, reference, lambda,
# ncases, ncontrols, options = options)
# ret.sel$pathway.pvalue
## [1] 0.03469653 # Mac OS
## [1] 0.05284472 # Linux with 1 thread
## [1] 0.04164584 # Linux with 32 threads
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.