It is a good idea to subsample (each iteration) the genes and samples from
a real RNA-seq dataset prior to applying
(and related functions) so that your conclusions are not dependent on the
specific structure of your dataset. This function is designed to efficiently
do this for you.
select_counts( mat, nsamp = ncol(mat), ngene = nrow(mat), gselect = c("random", "max", "mean_max", "custom"), gvec = NULL, filter_first = FALSE, nskip = 0L )
A numeric matrix of RNA-seq counts. The rows index the genes and the columns index the samples.
The number of samples (columns) to select from
The number of genes (rows) to select from
How should we select the subset of genes? Options include:
A logical vector of length
Should we first filter genes by the method of
Chen et al. (2016) (
The number of median-maximally expressed genes to skip.
Not used if
The samples (columns) are chosen randomly, with each sample having
an equal probability of being in the sub-matrix. The genes are selected
according to one of four schemes (see the description of the
If you have edgeR installed, then some functionality is provided for
filtering out the lowest expressed genes prior to applying subsampling
This filtering scheme is described in Chen et al. (2016).
If you want more control over this filtering, you should use
filterByExpr function from edgeR directly. You
can install edgeR by following instructions at
A numeric matrix, which is a
NULL, then the
row names of the returned matrix are the indices in
mat of the
selected genes. If
NULL, then the
column names of the returned matrix are the indices in
the selected samples.
Chen, Yunshun, Aaron TL Lun, and Gordon K. Smyth. "From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline." F1000Research 5 (2016). doi: 10.12688/f1000research.8987.2.
## Simulate data from given matrix of counts ## In practice, you would obtain mat from a real dataset, not simulate it. set.seed(1) n <- 100 p <- 1000 mat <- matrix(stats::rpois(n * p, lambda = 50), nrow = p) ## Subsample the matrix, then feed it into a thinning function submat <- select_counts(mat = mat, nsamp = 10, ngene = 100) thout <- thin_2group(mat = submat, prop_null = 0.5) ## The rownames and colnames (if NULL in mat) tell you which genes/samples ## were selected. rownames(submat) colnames(submat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.