rsnpset: RSNPset Analysis Function
In RSNPset: Efficient Score Statistics for Genome-Wide SNP Set Analysis

Description Usage Arguments Details Value Note See Also Examples

View source: R/rsnpset.R

Compute the (binomial, Gaussian, or Cox) efficient score for genome-wide SNP set analysis with binary, uncensored quantitative, or right-censored time-to-event traits.

    rsnpset(Y, delta=NULL, G, X=NULL, snp.sets, 
            score=c("cox", "binomial", "gaussian"), B=0, 
            r.method="monte carlo", v.method="empirical",
            v.permute=FALSE, ret.rank=FALSE, pinv.check=FALSE, 
            pinv.method="specdecomp", pinv.tol=7.8e-8)

`Y`	Vector of outcomes. Can be binary, continuous, or non-negative (time-to-event), dictating the type of score to request in argument `score`. Required.
`delta`	Vector of event indicators for survival outcomes. Must be binary: 1 indicates event, 0 indicates censored. Required when `score="cox"`, unused otherwise.
`G`	Matrix of SNP allele counts or expression levels. One row per patient and one column per SNP. Must be type `double`. Required.
`X`	Optional matrix of covariates. One row per patient and one column per cofactor. Must be numeric with non-colinear columns. If used, an intercept column will be appended automatically. Currently implemented only for `score="gaussian"`.
`snp.sets`	List of sets of SNPs to be analyzed together. Each element should be a vector of values (usually rsIDs) corresponding to names of columns of `G`. Required.
`score`	Score statistic to be calculated. (Must be appropriate for the type of outcome given in `Y`). Required to be one of `c("cox", "binomial", "gaussian")`.
`B`	Number of replication sets used to calculate the empirical distribution of the score statistics and empirical p-values. Default is 0 (no replications will be generated).
`r.method`	Resampling method. Default is `"monte carlo"`. `r.method="permutation"` is allowed when the model does not include covariates (`X=NULL`).
`v.method`	Method for calculating the variance of the score statistic. Default is `"empirical"` for all values of `score`, and is the only method currently implemented for `score="gaussian"` or `score="cox"`. `score="binomial"` also allows `"asymptotic"`.
`v.permute`	Boolean indicating whether or not to recalculate the variance of the score statistics for each permutation replicate. If `FALSE`, variance matrices are computed only for the observed data and then reused for the permutation replicates. `FALSE` when `r.method="monte carlo"` (does not apply). `r.method="permutation"` also allows `TRUE`.
`ret.rank`	Boolean indicating whether or not to return the rank of the variance matrices for the permutation replicates. `FALSE` when `r.method="monte carlo"` (does not apply). `r.method="permutation"` also allows `TRUE`.
`pinv.check`	Boolean. If `TRUE`, the function returns several diagnostic measures (see below) of the Penrose-Moore inverse of the variance matrices. Default is `FALSE`.
`pinv.method`	Method for calculating the inverse of the variance matrices. Currently, only the default of `"specdecomp"` is implemented.
`pinv.tol`	Number indicating the tolerance for determining the rank of the variance matrix (see below). Default is `7.8e-8`.

For each SNP set, the function computes the statistic W as

W = t(Uvec)%*%Σ%*%Uvec

where Σ is the Penrose-Moore inverse of the variance matrix V = t(U)%*%U, and Uvec = t(colSums(U)). Here, U is an n by J matrix where entry i,j corresponds to the ith patient's contribution to the score statistic for SNP j. Statistical performance is improved by centering the values of G for each SNP prior to calculating U.

Under suitable regularity conditions, the distribution of W can be approximated by a chi-squared distribution with degrees of freedom equal to the rank of V. The rank is determined as the number of eigenvalues greater than pinv.tol. For more information on SNP sets and the efficient score, see the package vignette.

If B > 0 and r.method="monte carlo", B resampling replicates of W are obtained by replacing Uvec = t(colSums(U)) with Uvec = t(colSums(U*Z)), where Z is a vector of n normal random values. Replications are executed in parallel, if a backend is available.

If B > 0 and r.method="permutation", B permutation replicates of W are obtained by permuting the values of Y, or, in the case of score="cox", by permuting (Y,delta) pairs. Permutation replicates are executed in parallel, if a backend is available. Note that r.method="permutation" is not appropriate when the model includes covariates.

If pinv.check=TRUE, the following diagnostic measures of the Penrose-Moore inverse are calculated.

Column	Absolute largest element of:
`d0`	Σ - QDQ
`d1`	V%%Σ%%V-V
`d2`	Σ%%V%%Σ-Σ
`d2`	t(V%%Σ)-V%%Σ
`d4`	t(Σ%%V)-Σ%%V

where QDQ is the spectral decomposition of V. Departure of these values from zero indicates poor performance of the Penrose-Moore inverse.

An S3 class RSNPset object consisting of a list of objects of class data.frame. The first data frame in the list describes the observed statistics, with B additional data frames corresponding to the requested resampling replicates. The rows of each data frame correspond to the elements of the snp.sets argument. The first column is W, the calculated efficient score for that set, and the second is rank, the rank of the variance matrix of the computed statistic. If ret.rank=FALSE, the ranks are not returned for the replicates. (They are assumed to be identical). The first data frame in the list also includes a third column, m, giving the number of SNPs analyzed in that SNP set.

If pinv.check=TRUE, a list of B+1 data frames, each with one row per SNP set and five columns of diagnostic measures of the calculated Penrose-Moore inverse (see above), is returned as an attribute. This and other attributes of the function's execution can be accessed using the class's summary function.

I. This function does not require that all entries of an element of snp.sets be present in the matrix G. If an element contains column names that are not present in G, the function will execute without objection and return a value based on the subset of columns that are present.

II. No statistics are returned for SNP sets which include SNPs with missing values (i.e. NAs in the selected columns of the matrix G).

III. As the Cox score statistic (method="cox") is not a sum of independent patient level scores, some level of pruning of SNPs is recommended. The permutation resampling facilities of the package can be utilized to assess the performance of the asymptotic inference. The development of robust methods for calculating the score statistics and approximating the covariance matrix is under way.

The function rsnpset.pvalue provides options for computing p-values for the returned statistics.

The function summary.RSNPset provides diagnostics and information about the function's execution.

n <- 200    # Number of patients
m <- 1000   # Number of SNPs

set.seed(123)

G <- matrix(rnorm(n*m), n, m)   # Normalized SNP expression levels
rsids <- paste0("rs", 1:m)      # SNP rsIDs 
colnames(G) <- rsids
 
K <- 10                         # Number of SNP sets
genes <- paste0("XYZ", 1:K)     # Gene names 
gsets <- lapply(sample(3:50, size=K, replace=TRUE), sample, x=rsids)
names(gsets) <- genes

# Binary outcome
Yb <- rbinom(n, 1, 0.5) 
head(rsnpset(Y=Yb, G=G, snp.sets=gsets, score="binomial", v.method="empirical"))
head(rsnpset(Y=Yb, G=G, snp.sets=gsets, score="binomial", v.method="asymptotic"))

# Quantitative outcome
Yq <- rbinom(n, 1, 0.5)
head(rsnpset(Y=Yq, G=G, snp.sets=gsets, score="gaussian"))

# Survival outcome
time <- rexp(n, 1/10)           # Survival time
event <- rbinom(n, 1, 0.9)      # Event indicator
head(rsnpset(Y=time, delta=event, G=G, snp.sets=gsets, score="cox"))

## Not run: 
# Optional parallel backend
library(doParallel)
registerDoParallel(cores=8)

res <- rsnpset(Y=Yb, G=G, snp.sets=gsets, score="binomial", B=1000)
length(res) # = 1001 
## End(Not run)