thin_2group | R Documentation |
Given a matrix of real RNA-seq counts, this function will
randomly assign samples to one of two groups, draw
the log2-fold change in expression between two groups for each gene,
and add this signal to the RNA-seq counts matrix. The user may specify
the proportion of samples in each group, the proportion of null genes
(where the log2-fold change is 0),
and the signal function. This is a specific application of the
general binomial thinning approach implemented in thin_diff
.
thin_2group(
mat,
prop_null = 1,
signal_fun = stats::rnorm,
signal_params = list(mean = 0, sd = 1),
group_prop = 0.5,
corvec = NULL,
alpha = 0,
permute_method = c("hungarian", "marriage"),
type = c("thin", "mult")
)
mat |
A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples. |
prop_null |
The proportion of genes that are null. |
signal_fun |
A function that returns the signal. This should take as
input |
signal_params |
A list of additional arguments to pass to
|
group_prop |
The proportion of individuals that are in group 1. |
corvec |
A vector of target correlations. |
alpha |
The scaling factor for the signal distribution from
Stephens (2016). If |
permute_method |
Should we use the Gale-Shapley algorithm
for stable marriages ( |
type |
Should we apply binomial thinning ( |
The specific application of binomial thinning to the two-group model was used in Gerard and Stephens (2018) and Gerard and Stephens (2021). This is a specific case of the general method described in Gerard (2020).
A list-like S3 object of class ThinData
.
Components include some or all of the following:
mat
The modified matrix of counts.
designmat
The design matrix of variables used to simulate
signal. This is made by column-binding design_fixed
and the
permuted version of design_perm
.
coefmat
A matrix of coefficients corresponding to
designmat
.
design_obs
Additional variables that should be included in
your design matrix in downstream fittings. This is made by
column-binding the vector of 1's with design_obs
.
sv
A matrix of estimated surrogate variables. In simulation studies you would probably leave this out and estimate your own surrogate variables.
cormat
A matrix of target correlations between the
surrogate variables and the permuted variables in the design matrix.
This might be different from the target_cor
you input because
we pass it through fix_cor
to ensure
positive semi-definiteness of the resulting covariance matrix.
matching_var
A matrix of simulated variables used to
permute design_perm
if the target_cor
is not
NULL
.
David Gerard
Gale, David, and Lloyd S. Shapley. "College admissions and the stability of marriage." The American Mathematical Monthly 69, no. 1 (1962): 9-15. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00029890.1962.11989827")}.
Gerard, D., and Stephens, M. (2021). "Unifying and Generalizing Methods for Removing Unwanted Variation Based on Negative Controls." Statistica Sinica, 31(3), 1145-1166 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.5705/ss.202018.0345")}.
David Gerard and Matthew Stephens (2018). "Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation." Biostatistics, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biostatistics/kxy029")}.
Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning." BMC Bioinformatics. 21(1), 206. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/s12859-020-3450-9")}.
Hornik K (2005). "A CLUE for CLUster Ensembles." Journal of Statistical Software, 14(12). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v014.i12")}. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v014.i12")}.
C. Papadimitriou and K. Steiglitz (1982), Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs: Prentice Hall.
Stephens, Matthew. "False discovery rates: a new deal." Biostatistics 18, no. 2 (2016): 275-294. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biostatistics/kxw041")}.
Wakefield, Jon. "Bayes factors for genome-wide association studies: comparison with P-values." Genetic epidemiology 33, no. 1 (2009): 79-86. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/gepi.20359")}.
select_counts
For subsampling the rows and columns of your real RNA-seq count matrix prior to applying binomial thinning.
thin_diff
For the more general thinning approach.
ThinDataToSummarizedExperiment
For converting a ThinData object to a SummarizedExperiment object.
ThinDataToDESeqDataSet
For converting a ThinData object to a DESeqDataSet object.
## Simulate data from given matrix of counts
## In practice, you would obtain Y from a real dataset, not simulate it.
set.seed(1)
nsamp <- 10
ngene <- 1000
Y <- matrix(stats::rpois(nsamp * ngene, lambda = 50), nrow = ngene)
thinout <- thin_2group(mat = Y,
prop_null = 0.9,
signal_fun = stats::rexp,
signal_params = list(rate = 0.5))
## 90 percent of genes are null
mean(abs(thinout$coef) < 10^-6)
## Check the estimates of the log2-fold change
Ynew <- log2(t(thinout$mat + 0.5))
X <- thinout$designmat
Bhat <- coef(lm(Ynew ~ X))["X", ]
plot(thinout$coefmat,
Bhat,
xlab = "log2-fold change",
ylab = "Estimated log2-fold change")
abline(0, 1, col = 2, lwd = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.