estimateScaffoldParameters: Estimate or set all parameters for Scaffold simulation.

View source: R/estimate.R

estimateScaffoldParametersR Documentation

Estimate or set all parameters for Scaffold simulation.

Description

Estimate or set all parameters for Scaffold simulation.

Usage

estimateScaffoldParameters(
  sce = NULL,
  sceUMI = FALSE,
  numCells = NULL,
  numGenes = NULL,
  geneMeans = NULL,
  totalTranscripts = NULL,
  genes = NULL,
  protocol = "C1",
  useUMI = FALSE,
  popHet = NULL,
  geneEfficiency = NULL,
  captureEfficiency = NULL,
  efficiencyRT = NULL,
  typeOfAmp = "PCR",
  numPreAmpCycles = 18,
  numAmpCycles = 12,
  preAmpEfficiency = NULL,
  ampEfficiency = NULL,
  tagEfficiency = NULL,
  equalizationAmount = 1,
  totalDepth = NULL,
  usePops = NULL,
  useDynamic = NULL,
  rand.seed = 312
)

Arguments

sce

A SingleCellExperiment object used to estimate some parameters for Scaffold. A data matrix is also acceptable as input. This can be left as NULL if all other parameters are passed in.

sceUMI

Whether or not the data in the SCE object are UMI counts.

numCells

The number of cells to be used in the simulation. If left NULL, the number of cells in the sce object is used. If simulating multiple populations, this should be a vector.

numGenes

The number of genes to be used in the simulation. If left NULL, the number of genes in the sce object is used.

geneMeans

The mean expression level of each gene. If left NULL, the means are estimated from the sce object.

totalTranscripts

The total number of transcripts per cell. If left NULL, the deafult is 300,000.

genes

A vector of names for each gene. If left NULL, the gene names from the sce object are used.

protocol

The protocol to model in the simulation (accepted input is: C1, droplet, 10X).

useUMI

A TRUE/FALSE indicating whether the protocol should use UMIs (Unique Molecular Identifiers). Droplet or 10X protocols have this set as TRUE for the default, otherwise FALSE.

popHet

a vector of length two to indicate the lower and upper bounds of the amount of perturbation applied to the simulated initial gene counts. This represents natural population heterogeneity. If NULL, scaffold will estimate from the data. To simulate a homogenous population, set popHet=c(1,1). To simulate a moderately homogenous population, for example, try c(.6, 2). Values must be positive.

geneEfficiency

This parameter is not currently used, but may be implemented in a future release.

captureEfficiency

A vector of values between 0 and 1 to indicate the proportion of mRNA molecules successfully captured for the genes in each cell. If left NULL, this value is estimated using the sce object.

efficiencyRT

If left NULL (default), this step of the protocol is skipped. Otherwise, the user can specify a vector of values between 0 and 1 to indicate the proportion of mRNA succesfully converted to cDNA.

typeOfAmp

The amplification method used in the simulation, defaults to "PCR", "IVT" is another accepted value.

numPreAmpCycles

The number of cycles to use in the pre-amplification or first amplification stage of the simulation.

numAmpCycles

The number of cycles to use in the second amplification stage of the simulation, or only amplification for droplet and 10X protocols.

preAmpEfficiency

A vector of values between 0 and 1 indicating the efficiency of the first simulated amplification cycle. If set to 1, then all molecules will double each cycle.

ampEfficiency

A vector of values between 0 and 1 indicating the efficiency of the second simulated amplification cycle. For droplet/10X protocols this should be a vector of length one (a single value).

tagEfficiency

A value between 0 and 1 indicating the tagmentation efficiency.

equalizationAmount

A value between 0 and 1 indicating the q* to determine the number of samples that undergo dilution in the equalization step of the simulation. A value of 0 indicates all cells are diluted to the smallest concentration and a value of 1 indicates no equalization is performed.

totalDepth

The total sequencing depth of the simulated data. If left NULL, this is taken from the sce object. If more cells are generated than in the original dataset, then the totalDepth will be scaled up accordingly.

usePops

This should be a named list with elements: propGenes, fc_mean, fc_sd. The elements are vectors with length one less than the number of cell populations. propGenes indicates the proportion of genes having distinct expression compared to the first cell population. fc_mean and fc_sd control each populations fold-change mean and standard deviation.

useDynamic

This should be a named list with elements: propGenes, dynGenes, degree, knots, and theta. propGenes indicates the proportion of genes that should be simulated dynamic. dynGenes is an optional parameter detailing an exact list of genes that will be generated as dynamic. degree, knots, and theta control the spline parameters to generate dynamic trends.

rand.seed

(Optional) If numGenes is smaller than the number of genes in sce, the seed used to ensure reproducibility when subsampling genes. Defaults to 312.


rhondabacher/scaffold documentation built on Sept. 6, 2024, 4:53 p.m.