Description Details Author(s) References Examples
Scalable and accurate implementation of generalized mixed mode with the support of Genomic Data Structure (GDS) files and highly optimized C++ implementation. It is designed for single variant tests in large-scale phenome-wide association studies (PheWAS) with millions of variants and hundreds of thousands of samples, e.g., UK Biobank genotype data, controlling for case-control imbalance and sample structure in single variant association studies.
The implementation of SAIGEgds is based on the original SAIGE R package (v0.29.4.4) [Zhou et al. 2018] https://github.com/weizhouUMICH/SAIGE/releases/tag/v0.29.4.4. All of the calculation with single-precision floating-point numbers in SAIGE are replaced by the double-precision calculation in SAIGEgds. SAIGEgds also implements some of the SPAtest functions in C to speed up the calculation of Saddlepoint Approximation.
Package: | SAIGEgds |
Type: | Package |
License: | GPL version 3 |
Xiuwen Zheng xiuwen.zheng@abbvie.com, Wei Zhou (the original author of the SAIGE R package, https://github.com/weizhouUMICH/SAIGE)
Zheng X, Davis J.Wade. SAIGEgds – an efficient statistical tool for large-scale PheWAS with mixed models. *Bioinformatics* (2020). DOI: 10.1093/bioinformatics/btaa731.
Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. *Nat Genet* (2018). Sep;50(9):1335-1341.
Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D. SeqArray – A storage-efficient high-performance data format for WGS variant calls. *Bioinformatics* (2017). DOI: 10.1093/bioinformatics/btx145.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # open the GDS file
fn <- system.file("extdata", "grm1k_10k_snp.gds", package="SAIGEgds")
gdsfile <- seqOpen(fn)
# load phenotype
phenofn <- system.file("extdata", "pheno.txt.gz", package="SAIGEgds")
pheno <- read.table(phenofn, header=TRUE, as.is=TRUE)
head(pheno)
# fit the null model
glmm <- seqFitNullGLMM_SPA(y ~ x1 + x2, pheno, gdsfile, trait.type="binary")
# p-value calculation
assoc <- seqAssocGLMM_SPA(gdsfile, glmm, mac=10)
head(assoc)
# close the GDS file
seqClose(gdsfile)
|
Loading required package: gdsfmt
Loading required package: SeqArray
Loading required package: Rcpp
sample.id y yy x1 x2
1 s1 0 4.5542 1.5118 1
2 s2 0 3.7941 0.3898 1
3 s3 0 5.0411 -0.6212 1
4 s4 0 5.6394 -2.2147 1
5 s5 0 4.2134 1.1249 1
6 s6 0 4.6145 -0.0449 1
SAIGE association analysis:
Thu May 27 14:50:25 2021
Filtering variants:
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 0s
# of selected variants: 9,976
Fit the null model: y ~ x1 + x2 + var(GRM)
# of samples: 1,000
# of variants: 9,976
using 1 thread
Transform on the design matrix with QR decomposition:
new formula: y ~ x0 + x1 + x2 - 1
Start loading SNP genotypes:
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 0s
using 6.6M (sparse matrix)
Binary outcome: y
y Number Proportion
0 902 0.902
1 98 0.098
Initial fixed-effect coefficients:
x0 x1 x2
2.520514 -0.7666948 -0.4557928
Initial variance component estimates, tau:
Sigma_E: 1, Sigma_G: 0.499412
Iteration 1:
tau: (1, 0.4994116)
fixed coeff: (2.520514, -0.7666948, -0.4557928)
Iteration 2:
tau: (1, 0.3287896)
fixed coeff: (2.521231, -0.776603, -0.4592503)
Iteration 3:
tau: (1, 0.2817812)
fixed coeff: (2.525954, -0.7738757, -0.4579659)
Iteration 4:
tau: (1, 0.3211452)
fixed coeff: (2.525719, -0.7730823, -0.4577413)
Iteration 5:
tau: (1, 0.3361534)
fixed coeff: (2.527166, -0.7739766, -0.4579633)
Final tau: (1, 0.3322063)
fixed coeff: (2.527666, -0.774237, -0.4580237)
Calculate the average ratio of variances:
Thu May 27 14:50:31 2021
1, maf: 0.0370, mac: 74, ratio: 0.9325 (var1: 0.0792, var2: 0.0849)
2, maf: 0.0645, mac: 129, ratio: 0.9459 (var1: 0.0723, var2: 0.0764)
3, maf: 0.4390, mac: 878, ratio: 0.9398 (var1: 0.0422, var2: 0.0449)
4, maf: 0.0115, mac: 23, ratio: 0.9202 (var1: 0.1, var2: 0.109)
5, maf: 0.0135, mac: 27, ratio: 0.9439 (var1: 0.0823, var2: 0.0872)
6, maf: 0.0505, mac: 101, ratio: 0.9391 (var1: 0.0716, var2: 0.0763)
7, maf: 0.0425, mac: 85, ratio: 0.9417 (var1: 0.073, var2: 0.0775)
8, maf: 0.2290, mac: 458, ratio: 0.9421 (var1: 0.0563, var2: 0.0598)
9, maf: 0.0270, mac: 54, ratio: 0.9381 (var1: 0.0761, var2: 0.0812)
10, maf: 0.0205, mac: 41, ratio: 0.9390 (var1: 0.0824, var2: 0.0878)
11, maf: 0.1560, mac: 312, ratio: 0.9384 (var1: 0.0666, var2: 0.071)
12, maf: 0.0285, mac: 57, ratio: 0.9343 (var1: 0.0803, var2: 0.0859)
13, maf: 0.4110, mac: 822, ratio: 0.9376 (var1: 0.0458, var2: 0.0488)
14, maf: 0.4530, mac: 906, ratio: 0.9421 (var1: 0.0398, var2: 0.0422)
15, maf: 0.0930, mac: 186, ratio: 0.9396 (var1: 0.0715, var2: 0.0761)
16, maf: 0.0220, mac: 44, ratio: 0.9387 (var1: 0.0668, var2: 0.0712)
17, maf: 0.1655, mac: 331, ratio: 0.9410 (var1: 0.0641, var2: 0.0681)
18, maf: 0.4520, mac: 904, ratio: 0.9375 (var1: 0.0438, var2: 0.0467)
19, maf: 0.0105, mac: 21, ratio: 0.9406 (var1: 0.0821, var2: 0.0873)
20, maf: 0.0350, mac: 70, ratio: 0.9332 (var1: 0.0833, var2: 0.0893)
21, maf: 0.0235, mac: 47, ratio: 0.9377 (var1: 0.0744, var2: 0.0793)
22, maf: 0.3730, mac: 746, ratio: 0.9406 (var1: 0.0474, var2: 0.0504)
23, maf: 0.0130, mac: 26, ratio: 0.9530 (var1: 0.0629, var2: 0.066)
24, maf: 0.0345, mac: 69, ratio: 0.9459 (var1: 0.0684, var2: 0.0724)
25, maf: 0.1905, mac: 381, ratio: 0.9370 (var1: 0.0628, var2: 0.0671)
26, maf: 0.4080, mac: 816, ratio: 0.9392 (var1: 0.0422, var2: 0.0449)
27, maf: 0.0105, mac: 21, ratio: 0.9338 (var1: 0.0902, var2: 0.0966)
28, maf: 0.1350, mac: 270, ratio: 0.9422 (var1: 0.0624, var2: 0.0663)
29, maf: 0.1600, mac: 320, ratio: 0.9393 (var1: 0.0648, var2: 0.069)
30, maf: 0.0335, mac: 67, ratio: 0.9396 (var1: 0.0738, var2: 0.0785)
ratio avg. is 0.9391186, sd: 0.005418568
Thu May 27 14:50:32 2021
Done.
SAIGE association analysis:
# of samples: 1,000
# of variants: 10,000
MAF threshold: NaN
MAC threshold: 10
missing threshold for variants: 0.1
p-value threshold for SPA adjustment: 0.05
variance ratio for approximation: 0.9391186
# of processes: 1
[..................................................] 0%, ETC: ---
[==================================================] 100%, completed, 0s
# of variants after filtering by MAF, MAC and missing thresholds: 9,976
Done.
id chr pos rs.id ref alt AF.alt mac num beta SE pval
1 1 1 1 rs1 1 2 0.0305 61 1000 0.60625030 0.4725683 0.1995327
2 2 1 2 rs2 1 2 0.0380 76 1000 -0.09626136 0.4105846 0.8146360
3 3 1 3 rs3 1 2 0.0215 43 1000 -0.55244963 0.5705584 0.3329139
4 4 1 4 rs4 1 2 0.3895 779 1000 0.14369591 0.1651303 0.3841926
5 5 1 5 rs5 1 2 0.0390 78 1000 0.40635898 0.4038370 0.3142977
6 6 1 6 rs6 1 2 0.0525 105 1000 0.29798129 0.3451521 0.3879543
p.norm converged
1 0.1995327 TRUE
2 0.8146360 TRUE
3 0.3329139 TRUE
4 0.3841926 TRUE
5 0.3142977 TRUE
6 0.3879543 TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.