Home

/

CRAN

/

FLASHMM

/

simuRNAseq: Simulating Multi-sample Multi-cell-type scRNA-seq Dataset...

simuRNAseq: Simulating Multi-sample Multi-cell-type scRNA-seq Dataset...
In FLASHMM: Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models

View source: R/simuRNAseq.R

simuRNAseq

R Documentation

Simulating Multi-sample Multi-cell-type scRNA-seq Dataset based on Negative Binomial Distribution

Description

simuRNAseq simulates scRNA-seq data with multiple subjects (samples), multiple clusters (cell-types) and two treatments (conditions) based on a negative binomial (NB) distribution using a reference data as background or control. The reference data consisting of genes-by-cells counts matrix is used to estimate the NB dispersion and means for the genes to be simulated. The simulated genes are randomly selected from the reference data. The NB dispersion are estimated by the method-of-moments estimate (MME). The NB means for the background in the control are estimated by the sample mean. The NB means for the differentially expressed (DE) genes are given by the sample mean plus a log-fold change (logFC). The simulated cells are randomly selected from the meta data that specifies subjects, cell-types and treatments for the cells. The meta data consists of samples, clusters of cell types, and treatments, which can be generated either from reference data or randomly. If not provided, it will be randomly generated. A random seed is recommended to be specified by set.seed before simulating data.

Usage

simuRNAseq(
  counts,
  nGenes = nrow(counts),
  nCells = ncol(counts),
  metadata = NULL,
  samples.nested = TRUE,
  nsam = 25,
  ncls = 10,
  ntrt = 2,
  trt = NULL,
  nDEgenes = ncls,
  nDEgenesType,
  pDEgenesType = NULL,
  adjust.library.size = TRUE,
  direction = c("both", "up", "down"),
  minbeta = 0.25,
  maxbeta = 1,
  var.randomeffects = 0.1
)

Arguments

`counts`	A genes-by-cells matrix of reference counts. If missing, counts is generated by a negative binomial distribution.
`nGenes`	Number of genes to be simulated.
`nCells`	Number of cells to be simulated.
`metadata`	The meta data consisting of 4 columns: sam (sample labels), cls (cluster lables of cell types), trt (treatments or conditions) and libsize (library size or total counts per cell), which is randomly generated if not provided.
`samples.nested`	If TRUE, when metadata is not provided, each simulated subject (sample) belongs to only one condition (either treatment or control), that is, the subject is nested in a condition (treatment).
`nsam`	Number of subjects (individuals).
`ncls`	Number of clusters (cell-types).
`ntrt`	Number of treatments (only one condition is treated).
`trt`	Treatment, specifying which condition is treated.
`nDEgenes`	Total number of DE genes.
`nDEgenesType`	Number of DE genes specific to a cell type, named by cell cluster labels.
`pDEgenesType`	Proportion of DE genes in a cell-type. Default NULL means equal proportion.
`adjust.library.size`	If TRUE, adjust library sizes using the reference counts.
`direction`	Specify if the DE genes are up- and/or down-regulated.
`minbeta`	Lower bound of the DE gene logFC.
`maxbeta`	Upper bound of the DE gene logFC. minbeta < maxbeta. If direction = "both", minbeta*maxbeta > 0. If direction = "up", minbeta > 0. If direction = "down", maxbeta < 0.
`var.randomeffects`	Variance of random effects

Value

ref.mean.dispersion: A data frame of the reference counts' means and dispersion,

metadata: Meta data consisting of 4 columns: sam (sample labels), cls (cluster lables of cell types), trt (two treatments/conditions) and libsize (library sizes).

counts: A genes-by-cells matrix of the simulated counts.

DEgenes: A data frame of DE genes consisting of 3 columns: gene, beta (effect), and cluster to which the gene is specific.

treatment: The condition treated.

Examples

#Simulate a multi-sample multi-cell-type scRNA-seq dataset.
set.seed(2412)
dat <- simuRNAseq(nGenes = 50, nCells = 1000, nsam = 25, ncls = 4, ntrt = 2, nDEgenes = 6)
str(dat)
table(dat$metadata[, c("sam", "trt")]) #The samples are nested in a condition.

#Analyze differentially expressed (DE) genes specific to a cell-type using LMM.
Y <- log(dat$counts + 1) #expressions (log-transformed counts)
X <- model.matrix(~ 0 + log(libsize) + cls + cls:trt, data = dat$metadata)
Z <- model.matrix(~ 0 + sam, data = dat$metadata)
d <- ncol(Z)

#Fit LMM using cell-level data.
fit <- lmmfit(Y, X, Z, d = d)

#Fit LMM using summary-level data.
#Compute and store the summary-level data:
n <- nrow(X)
XX <- t(X)%*%X
XY <- t(Y%*%X)
ZX <- t(Z)%*%X
ZY <- t(Y%*%Z)
ZZ <- t(Z)%*%Z
Ynorm <- rowSums(Y*Y)
fitss <- lmm(XX, XY, ZX, ZY, ZZ, Ynorm = Ynorm, n = n, d = d)
identical(fit, fitss)

#Hypothesis testing
test <- lmmtest(fit)
head(test)

#The DE genes specific to a cell-type.
tail(test[, grep(":", colnames(test))])
#The true DE genes
dat$DEgenes

FLASHMM documentation built on April 12, 2025, 1:32 a.m.

FLASHMM index

README.md Single-cell differential expression analysis with FLASHMM

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

FLASHMM
Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models

simuRNAseq: Simulating Multi-sample Multi-cell-type scRNA-seq Dataset...
In FLASHMM: Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models

Simulating Multi-sample Multi-cell-type scRNA-seq Dataset based on Negative Binomial Distribution

Description

Usage

Arguments

Value

Examples

Related to simuRNAseq in FLASHMM...

R Package Documentation

Browse R Packages

We want your feedback!

FLASHMM Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models

simuRNAseq: Simulating Multi-sample Multi-cell-type scRNA-seq Dataset... In FLASHMM: Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models

Simulating Multi-sample Multi-cell-type scRNA-seq Dataset based on Negative Binomial Distribution

Description

Usage

Arguments

Value

Examples

Related to simuRNAseq in FLASHMM...

R Package Documentation

Browse R Packages

We want your feedback!

FLASHMM
Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models

simuRNAseq: Simulating Multi-sample Multi-cell-type scRNA-seq Dataset...
In FLASHMM: Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models