thin_all | R Documentation |
Given a matrix of real RNA-seq counts, this function will apply a
thinning factor uniformly to every count in this matrix. This uniformly
lowers the read-depth for the entire dataset. The thinning factor should
be provided on the log2-scale. This is a specific application of the
binomial thinning approach in thin_diff
. Though this particular
form of thinning was used by Robinson and Storey (2014) in the context
of deriving read-depth suggestions. It is also
described in detail in Gerard (2020).
thin_all(mat, thinlog2, type = c("thin", "mult"))
mat |
A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples. |
thinlog2 |
A numeric scalar. This is the amount to shrink each count
in |
type |
Should we apply binomial thinning ( |
A list-like S3 object of class ThinData
.
Components include some or all of the following:
mat
The modified matrix of counts.
designmat
The design matrix of variables used to simulate
signal. This is made by column-binding design_fixed
and the
permuted version of design_perm
.
coefmat
A matrix of coefficients corresponding to
designmat
.
design_obs
Additional variables that should be included in
your design matrix in downstream fittings. This is made by
column-binding the vector of 1's with design_obs
.
sv
A matrix of estimated surrogate variables. In simulation studies you would probably leave this out and estimate your own surrogate variables.
cormat
A matrix of target correlations between the
surrogate variables and the permuted variables in the design matrix.
This might be different from the target_cor
you input because
we pass it through fix_cor
to ensure
positive semi-definiteness of the resulting covariance matrix.
matching_var
A matrix of simulated variables used to
permute design_perm
if the target_cor
is not
NULL
.
David Gerard
Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning." BMC Bioinformatics. 21(1), 206. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/s12859-020-3450-9")}.
Robinson, David G., and John D. Storey. "subSeq: determining appropriate sequencing depth through efficient read subsampling." Bioinformatics 30, no. 23 (2014): 3424-3426. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/bioinformatics/btu552")}.
select_counts
For subsampling the rows and columns of your real RNA-seq count matrix prior to applying binomial thinning.
thin_diff
For the more general thinning approach.
thin_lib
For thinning sample-wise.
thin_gene
For thinning gene-wise.
ThinDataToSummarizedExperiment
For converting a ThinData object to a SummarizedExperiment object.
ThinDataToDESeqDataSet
For converting a ThinData object to a DESeqDataSet object.
## Generate count data and set thinning factor
## In practice, you would obtain mat from a real dataset, not simulate it.
set.seed(1)
n <- 10
p <- 1000
lambda <- 1000
mat <- matrix(lambda, ncol = n, nrow = p)
thinlog2 <- 1
## Thin read-depths
thout <- thin_all(mat = mat, thinlog2 = thinlog2)
## Compare empirical and theoretical proportions
mean(thout$mat) / lambda
2 ^ -thinlog2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.