Compute the (binomial, Gaussian, or Cox) efficient score for genome-wide SNP set analysis with binary, uncensored quantitative, or right-censored time-to-event traits.

1 2 3 4 5 |

`Y` |
Vector of outcomes. Can be binary, continuous, or non-negative (time-to-event), dictating the type of score to request in argument |

`delta` |
Vector of event indicators for survival outcomes. Must be binary: 1 indicates event, 0 indicates censored. Required when |

`G` |
Matrix of SNP allele counts or expression levels. One row per patient and one column per SNP. Must be type |

`X` |
Optional matrix of covariates. One row per patient and one column per cofactor. Must be numeric with non-colinear columns. If used, an intercept column will be appended automatically. Currently implemented only for |

`snp.sets` |
List of sets of SNPs to be analyzed together. Each element should be a vector of values (usually rsIDs) corresponding to names of columns of |

`score` |
Score statistic to be calculated. (Must be appropriate for the type of outcome given in |

`B` |
Number of replication sets used to calculate the empirical distribution of the score statistics and empirical p-values. Default is 0 (no replications will be generated). |

`r.method` |
Resampling method. Default is |

`v.method` |
Method for calculating the variance of the score statistic. Default is |

`v.permute` |
Boolean indicating whether or not to recalculate the variance of the score statistics for each permutation replicate. If |

`ret.rank` |
Boolean indicating whether or not to return the rank of the variance matrices for the permutation replicates. |

`pinv.check` |
Boolean. If |

`pinv.method` |
Method for calculating the inverse of the variance matrices. Currently, only the default of |

`pinv.tol` |
Number indicating the tolerance for determining the rank of the variance matrix (see below). Default is |

For each SNP set, the function computes the statistic *W* as

*W = t(Uvec)%*%Σ%*%Uvec*

where *Σ* is the Penrose-Moore inverse of the variance matrix *V = t(U)%*%U*, and *Uvec = t(colSums(U))*. Here, *U* is an *n* by *J* matrix where entry *i,j* corresponds to the *i*th patient's contribution to the score statistic for SNP *j*. Statistical performance is improved by centering the values of `G`

for each SNP prior to calculating *U*.

Under suitable regularity conditions, the distribution of *W* can be approximated by a chi-squared distribution with degrees of freedom equal to the rank of *V*. The rank is determined as the number of eigenvalues greater than `pinv.tol`

. For more information on SNP sets and the efficient score, see the package vignette.

If `B > 0`

and `r.method="monte carlo"`

, `B`

resampling replicates of *W* are obtained by replacing *Uvec = t(colSums(U))* with *Uvec = t(colSums(U*Z))*, where *Z* is a vector of *n* normal random values. Replications are executed in parallel, if a backend is available.

If `B > 0`

and `r.method="permutation"`

, `B`

permutation replicates of *W* are obtained by permuting the values of `Y`

, or, in the case of `score="cox"`

, by permuting `(Y,delta)`

pairs. Permutation replicates are executed in parallel, if a backend is available. Note that `r.method="permutation"`

is not appropriate when the model includes covariates.

If `pinv.check=TRUE`

, the following diagnostic measures of the Penrose-Moore inverse are calculated.

Column | Absolute largest element of: |

`d0` | Σ - QDQ |

`d1` | V%*%Σ%*%V-V |

`d2` | Σ%*%V%*%Σ-Σ |

`d2` | t(V%*%Σ)-V%*%Σ |

`d4` | t(Σ%*%V)-Σ%*%V |

where *QDQ* is the spectral decomposition of *V*. Departure of these values from zero indicates poor performance of the Penrose-Moore inverse.

An S3 class `RSNPset`

object consisting of a list of objects of class `data.frame`

. The first data frame in the list describes the observed statistics, with `B`

additional data frames corresponding to the requested resampling replicates. The rows of each data frame correspond to the elements of the `snp.sets`

argument. The first column is `W`

, the calculated efficient score for that set, and the second is `rank`

, the rank of the variance matrix of the computed statistic. If `ret.rank=FALSE`

, the ranks are not returned for the replicates. (They are assumed to be identical). The first data frame in the list also includes a third column, `m`

, giving the number of SNPs analyzed in that SNP set.

If `pinv.check=TRUE`

, a list of `B+1`

data frames, each with one row per SNP set and five columns of diagnostic measures of the calculated Penrose-Moore inverse (see above), is returned as an attribute. This and other attributes of the function's execution can be accessed using the class's `summary`

function.

I. This function does not require that all entries of an element of `snp.sets`

be present in the matrix `G`

. If an element contains column names that are not present in `G`

, the function will execute without objection and return a value based on the subset of columns that *are* present.

II. No statistics are returned for SNP sets which include SNPs with missing values (i.e. `NA`

s in the selected columns of the matrix `G`

).

III. As the Cox score statistic (`method="cox"`

) is not a sum of independent patient level scores, some level of pruning of SNPs is recommended. The permutation resampling facilities of the package can be utilized to assess the performance of the asymptotic inference. The development of robust methods for calculating the score statistics and approximating the covariance matrix is under way.

The function `rsnpset.pvalue`

provides options for computing p-values for the returned statistics.

The function `summary.RSNPset`

provides diagnostics and information about the function's execution.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ```
n <- 200 # Number of patients
m <- 1000 # Number of SNPs
set.seed(123)
G <- matrix(rnorm(n*m), n, m) # Normalized SNP expression levels
rsids <- paste0("rs", 1:m) # SNP rsIDs
colnames(G) <- rsids
K <- 10 # Number of SNP sets
genes <- paste0("XYZ", 1:K) # Gene names
gsets <- lapply(sample(3:50, size=K, replace=TRUE), sample, x=rsids)
names(gsets) <- genes
# Binary outcome
Yb <- rbinom(n, 1, 0.5)
head(rsnpset(Y=Yb, G=G, snp.sets=gsets, score="binomial", v.method="empirical"))
head(rsnpset(Y=Yb, G=G, snp.sets=gsets, score="binomial", v.method="asymptotic"))
# Quantitative outcome
Yq <- rbinom(n, 1, 0.5)
head(rsnpset(Y=Yq, G=G, snp.sets=gsets, score="gaussian"))
# Survival outcome
time <- rexp(n, 1/10) # Survival time
event <- rbinom(n, 1, 0.9) # Event indicator
head(rsnpset(Y=time, delta=event, G=G, snp.sets=gsets, score="cox"))
## Not run:
# Optional parallel backend
library(doParallel)
registerDoParallel(cores=8)
res <- rsnpset(Y=Yb, G=G, snp.sets=gsets, score="binomial", B=1000)
length(res) # = 1001
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.