Efficient Score Statistics for Genome-Wide SNP Set Analysis
An implementation of three standard efficient score statistics (Cox, binomial, and Gaussian) for use in genome-wide SNP set analysis with complex traits.
This package is designed for the analysis of sets of related SNPs, using genes as the loci of interest, but the methodology can naturally be applied to other genomic loci, including bands and pathways. The core function,
rsnpset(), provides options for three efficient score statistics, binomial, Gaussian, and Cox, for the analysis of binary, quantitative, and time-to-event outcomes, but is easily extensible to include others. Code implementing the inferential procedure is primarily written in C++ and utilizes parallelization to reduce runtime. A supporting function,
rsnpset.pvalue(), offers easy computation of observed, resampling, FWER-adjusted, and FDR-adjusted p-values, and summary functions provide diagnostic measures and results metadata.
The inferential procedures are written primarily in C++ and utilize linear algebra routines from the Eigen library. This implementation is facilitated using the templates provided by the Rcpp and RcppEigen packages. Parallelization of the analysis, with reproducible randomization, is enabled by using the doRNG package to add parallel backends to looping constructs provided by the foreach package. The FDR-adjusted p-values are obtained using the qvalue package. Use of the fastmatch package allows efficient cross-referencing of SNP rsIDs in the data with the SNP sets.
Chanhee Yi, Alexander Sibley, and Kouros Owzar
Maintainer: Alexander Sibley <email@example.com>
Functions available in this package:
rsnpset.pvalue, and supporting summary functions
For more information on supporting packages, see:
snplist package can be used to generate sets of SNPs for analysis with this package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
n <- 200 # Number of patients m <- 1000 # Number of SNPs set.seed(123) G <- matrix(rnorm(n*m), n, m) # Normalized SNP expression levels rsids <- paste0("rs", 1:m) # SNP rsIDs colnames(G) <- rsids K <- 10 # Number of SNP sets genes <- paste0("XYZ", 1:K) # Gene names gsets <- lapply(sample(3:50, size=K, replace=TRUE), sample, x=rsids) names(gsets) <- genes # Survival outcome time <- rexp(n, 1/10) # Survival time event <- rbinom(n, 1, 0.9) # Event indicator res <- rsnpset(Y=time, delta=event, G=G, snp.sets=gsets, score="cox") head(res) summary(res) rsnpset.pvalue(res) ## Not run: # Optional parallel backend library(doParallel) registerDoParallel(cores=8) res <- rsnpset(Y=time, delta=event, G=G, snp.sets=gsets, score="cox", B=1000) rsnpset.pvalue(res) ## End(Not run) # Binary outcome set.seed(123) Y <- rbinom(n, 1, 0.5) head(rsnpset(Y=Y, G=G, snp.sets=gsets, score="binomial", v.method="empirical")) head(rsnpset(Y=Y, G=G, snp.sets=gsets, score="binomial", v.method="asymptotic"))
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.