Description Usage Arguments Details Value Author(s) References Examples
Applies the Transposable Sphering Algorithm to adjust for correlations among the rows and columns when conducting large-scale inference on the rows of a data matrix.
1 | TransSphere(dat, y, fdr, minlam, maxlam = NULL)
|
dat |
Data matrix. Inference will be conducted on the rows and the matrix should be oriented in this manner. For example in gene expression data, the data matrix should be oriented as genes by samples. |
y |
A vector of group labels. Labels should be denoted as a numeric 1 or 2. |
fdr |
Desired False Discovery Rate to be controlled. Default is 0.1. |
minlam |
Minimum regularization parameter to test via cross-validation for sparse inverse covariance estimation. Default is 0.15. Note that small values of this parameter may result in numerical instabilities. It is recommended to keep this parameter at the default. |
maxlam |
Maximum regularization parameter to test via cross-validation for sparse inverse covariance estimation. Default is 0.25. |
The Transposable Sphering Algorithm adjusts for correlations among the rows and columns of a data matrix before conducting large-scale inference. Currently, this method is only written for two-sample problems. The data matrix is row and column centered and two-sample T-statistics are computed for each row. The Transposable Sphering method is applied to the top 500 rows corresponding to the largest absolute T-statistics. The matrix is decomposed into a signal matrix, corresponding to the two classes of interest, and a noise matrix. This noise matrix is sphered so that both the rows and columns are approximately independent. Specifically, sparse inverse covariances of the rows and columns are estimated via Transposable Regularized Covariance Models and used to whiten the noise matrix. Cross-validation is used to estimate the regularization parameters controlling the amount of sparsity. The estimated signal matrix and sphered noise matrix are then added to form the sphered data matrix that is used to conduct large-scale inference. Test statistics are adjusted using central-matching, and the Benjamini-Hochberg step-up procedure is used to control the False Discovery Rate.
sig.rows |
The indices of the statistically
significant rows after controlling the False Discovery Rate at the
value |
t.stats |
Sphered two-sample T-statistics. |
p.vals |
Sphered (unadjusted) p-values. |
x.sphered |
The sphered data matrix. Note that only the top 500 rows are used in the algorithm so this data matrix is has row dimension at most 500. |
Genevera I. Allen
G. I. Allen and R. Tibshirani, "Inference with Transposable Data: Modeling the Effects of Row and Column Correlations", To Appear in Journal of the Royal Statistical Society, Series B (Theory & Methods), 2011.
G. I. Allen and R. Tibshirani, "Transposable regularized covariance models with an application to missing data imputation", Annals of Applied Statistics, 4:2, 764-790, 2010.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | #batch-effect simulation
n = 250
p = 50
y = c(rep(1,25),rep(2,25))
mu1true = c(rep(.5,25),rep(-.5,25),rep(0,n-50))
mu2true = c(rep(-.5,25),rep(.5,25),rep(0,n-50))
Smat = cbind(matrix(mu1true,n,p/2),matrix(mu2true,n,p/2))
mus = c(-.5,-.25,0,.25,.5)
Bmatsig = matrix(1,n,1) %*% t(rep(mus,each=10))
Bmat = Bmatsig + matrix(rnorm(n*p)*.75,n,p)
xxt = matrix(rnorm(2*n^2),n,2*n)
Sig = xxt %*% t(xxt)/(2*n); eSig = eigen(Sig);
xx = matrix(rnorm(n*p),n,p)
x.b = Smat + eSig$vectors %*% diag(sqrt(eSig$values)) %*%
eSig$vectors %*% xx + Bmat
#Transposable Sphering Algorithm
ans = TransSphere(x.b,y,fdr=.1,.15,.25)
#significant rows
ans$sig.rows
#true positive rate
sum(ans$sig.rows<=50)/50
#false positive rate
sum(ans$sig.rows>50)/200
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.