thin_gene | R Documentation |

Given a matrix of real RNA-seq counts, this function will apply a
separate, user-provided thinning factor to each gene. This uniformly
lowers the counts for all samples in a gene. The thinning factor
should be provided on the log2-scale. This is a specific application
of the binomial thinning approach in `thin_diff`

. The method is
described in detail in Gerard (2020).

thin_gene(mat, thinlog2, relative = FALSE, type = c("thin", "mult"))

`mat` |
A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples. |

`thinlog2` |
A vector of numerics. Element i is the amount to thin (on the log2 scale) for gene i. For example, a value of 0 means that we do not thin, a value of 1 means that we thin by a factor of 2, a value of 2 means we thin by a factor of 4, etc. |

`relative` |
A logical. Should we apply relative thinning ( |

`type` |
Should we apply binomial thinning ( |

A list-like S3 object of class `ThinData`

.
Components include some or all of the following:

`mat`

The modified matrix of counts.

`designmat`

The design matrix of variables used to simulate signal. This is made by column-binding

`design_fixed`

and the permuted version of`design_perm`

.`coefmat`

A matrix of coefficients corresponding to

`designmat`

.`design_obs`

Additional variables that should be included in your design matrix in downstream fittings. This is made by column-binding the vector of 1's with

`design_obs`

.`sv`

A matrix of estimated surrogate variables. In simulation studies you would probably leave this out and estimate your own surrogate variables.

`cormat`

A matrix of target correlations between the surrogate variables and the permuted variables in the design matrix. This might be different from the

`target_cor`

you input because we pass it through`fix_cor`

to ensure positive semi-definiteness of the resulting covariance matrix.`matching_var`

A matrix of simulated variables used to permute

`design_perm`

if the`target_cor`

is not`NULL`

.

David Gerard

Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning."

*BMC Bioinformatics*. 21(1), 206. doi: 10.1186/s12859-020-3450-9.

`select_counts`

For subsampling the rows and columns of your real RNA-seq count matrix prior to applying binomial thinning.

`thin_diff`

For the more general thinning approach.

`thin_lib`

For thinning sample-wise instead of gene-wise.

`thin_all`

For thinning all counts uniformly.

`ThinDataToSummarizedExperiment`

For converting a ThinData object to a SummarizedExperiment object.

`ThinDataToDESeqDataSet`

For converting a ThinData object to a DESeqDataSet object.

## Generate count data and thinning factors ## In practice, you would obtain mat from a real dataset, not simulate it. set.seed(1) n <- 10 p <- 1000 lambda <- 1000 mat <- matrix(lambda, ncol = n, nrow = p) thinlog2 <- rexp(n = p, rate = 1) ## Thin total gene expressions thout <- thin_gene(mat = mat, thinlog2 = thinlog2) ## Compare empirical thinning proportions to specified thinning proportions empirical_propvec <- rowMeans(thout$mat) / lambda specified_propvec <- 2 ^ (-thinlog2) plot(empirical_propvec, specified_propvec, xlab = "Empirical Thinning Proportion", ylab = "Specified Thinning Proportion") abline(0, 1, col = 2, lwd = 2)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.