Efficient Score Statistics for Genome-Wide SNP Set Analysis

Description

An implementation of three standard efficient score statistics (Cox, binomial, and Gaussian) for use in genome-wide SNP set analysis with complex traits.

Details

Package: RSNPset
Type: Package
Version: 0.4
Date: 2015-11-19
License: GPL-3

This package is designed for the analysis of sets of related SNPs, using genes as the loci of interest, but the methodology can naturally be applied to other genomic loci, including bands and pathways. The core function, rsnpset(), provides options for three efficient score statistics, binomial, Gaussian, and Cox, for the analysis of binary, quantitative, and time-to-event outcomes, but is easily extensible to include others. Code implementing the inferential procedure is primarily written in C++ and utilizes parallelization to reduce runtime. A supporting function, rsnpset.pvalue(), offers easy computation of observed, resampling, FWER-adjusted, and FDR-adjusted p-values, and summary functions provide diagnostic measures and results metadata.

Note

The inferential procedures are written primarily in C++ and utilize linear algebra routines from the Eigen library. This implementation is facilitated using the templates provided by the Rcpp and RcppEigen packages. Parallelization of the analysis, with reproducible randomization, is enabled by using the doRNG package to add parallel backends to looping constructs provided by the foreach package. The FDR-adjusted p-values are obtained using the qvalue package. Use of the fastmatch package allows efficient cross-referencing of SNP rsIDs in the data with the SNP sets.

Author(s)

Chanhee Yi, Alexander Sibley, and Kouros Owzar

Maintainer: Alexander Sibley <alexander.sibley@dm.duke.edu>

See Also

Functions available in this package: rsnpset, rsnpset.pvalue, and supporting summary functions summary.RSNPset, summary.RSNPset.pvalue

For more information on supporting packages, see: Eigen, Rcpp, RcppEigen, doRNG, foreach, qvalue, fastmatch

The snplist package can be used to generate sets of SNPs for analysis with this package.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
n <- 200    # Number of patients
m <- 1000   # Number of SNPs

set.seed(123)

G <- matrix(rnorm(n*m), n, m)   # Normalized SNP expression levels
rsids <- paste0("rs", 1:m)      # SNP rsIDs 
colnames(G) <- rsids
 
K <- 10                         # Number of SNP sets
genes <- paste0("XYZ", 1:K)     # Gene names 
gsets <- lapply(sample(3:50, size=K, replace=TRUE), sample, x=rsids)
names(gsets) <- genes

# Survival outcome
time <- rexp(n, 1/10)           # Survival time
event <- rbinom(n, 1, 0.9)      # Event indicator

res <- rsnpset(Y=time, delta=event, G=G, snp.sets=gsets, score="cox")
head(res)
summary(res)
rsnpset.pvalue(res)

## Not run: 
# Optional parallel backend
library(doParallel)
registerDoParallel(cores=8)

res <- rsnpset(Y=time, delta=event, G=G, snp.sets=gsets, score="cox", B=1000)
rsnpset.pvalue(res) 
## End(Not run)

# Binary outcome
set.seed(123)
Y <- rbinom(n, 1, 0.5) 

head(rsnpset(Y=Y, G=G, snp.sets=gsets, score="binomial", v.method="empirical"))
head(rsnpset(Y=Y, G=G, snp.sets=gsets, score="binomial", v.method="asymptotic"))