optimize.constant.budget.restrictedDoublets: Optimizing cost parameters to maximize detection power for a...

View source: R/power.R

optimize.constant.budget.restrictedDoubletsR Documentation

Optimizing cost parameters to maximize detection power for a given budget and 10X design

Description

This function determines the optimal parameter combination for a given budget. The optimal combination is thereby the one with the highest detection power. Of the three parameters sample size, cells per sample and read depth, two need to be set and the third one is uniquely defined given the other two parameters and the overall budget.

Usage

optimize.constant.budget.restrictedDoublets(
  totalBudget,
  type,
  ct,
  ct.freq,
  costKit,
  costFlowCell,
  readsPerFlowcell,
  ref.study,
  ref.study.name,
  cellsPerLane,
  read.umi.fit,
  gamma.mixed.fits,
  disp.fun.param,
  nSamplesRange = NULL,
  nCellsRange = NULL,
  readDepthRange = NULL,
  mappingEfficiency = 0.8,
  multipletRate = 7.67e-06,
  multipletFactor = 1.82,
  min.UMI.counts = 3,
  perc.indiv.expr = 0.5,
  cutoffVersion = "absolute",
  nGenes = 21000,
  samplingMethod = "quantiles",
  multipletRateGrowth = "linear",
  sign.threshold = 0.05,
  MTmethod = "Bonferroni",
  useSimulatedPower = FALSE,
  simThreshold = 4,
  speedPowerCalc = FALSE,
  indepSNPs = 10,
  ssize.ratio.de = 1,
  reactionsPerKit = 6
)

Arguments

totalBudget

Overall experimental budget

type

(eqtl/de) study

ct

Cell type of interest (name from the gamma mixed models)

ct.freq

Frequency of the cell type of interest

costKit

Cost for one 10X kit

costFlowCell

Cost of one flow cells for sequencing

readsPerFlowcell

Number reads that can be sequenced with one flow cell

ref.study

Data frame with reference studies to be used for expression ranks and effect sizes (required columns: name (study name), rank (expression rank), FoldChange (DE study) /Rsq (eQTL study))

ref.study.name

Name of the reference study. Will be checked in the ref.study data frame for it (as column name).

cellsPerLane

Maximal number of cells per 10X lane

read.umi.fit

Data frame for fitting the mean UMI counts per cell depending on the mean readds per cell (required columns: intercept, reads (slope))

gamma.mixed.fits

Data frame with gamma mixed fit parameters for each cell type (required columns: parameter, ct (cell type), intercept, meanUMI (slope))

disp.fun.param

Function to fit the dispersion parameter dependent on the mean (required columns: ct (cell type), asymptDisp, extraPois (both from taken from DEseq))

nSamplesRange

Range of sample sizes that should be tested (vector)

nCellsRange

Range of cells per individual that should be tested (vector)

readDepthRange

Range of read depth values that should be tested (vector)

mappingEfficiency

Fraction of reads successfully mapped to the transcriptome in the end (need to be between 1-0)

multipletRate

Expected increase in multiplets for additional cell in the lane

multipletFactor

Expected read proportion of multiplet cells vs singlet cells

min.UMI.counts

Expression cutoff in one individual: if cutoffVersion=absolute, more than this number of UMI counts for each gene per individual and cell type is required; if cutoffVersion=percentage, more than this percentage of cells need to have a count value large than 0

perc.indiv.expr

Expression cutoff on the population level: if number < 1, percentage of individuals that need to have this gene expressed to define it as globally expressed; if number >=1 absolute number of individuals that need to have this gene expressed

cutoffVersion

Either "absolute" or "percentage" leading to different interpretations of min.counts (see description above)

nGenes

Number of genes to simulate (should match the number of genes used for the fitting)

samplingMethod

Approach to sample the gene mean values (either taking quantiles or random sampling)

multipletRateGrowth

Development of multiplet rate with increasing number of cells per lane, "linear" if overloading should be modeled explicitly, otherwise "constant". The default value for the parameter multipletRate is matching the option "linear".

sign.threshold

Significance threshold

MTmethod

Multiple testing correction method (possible options: "Bonferroni","FDR","none")

useSimulatedPower

Option to simulate eQTL power for small mean values to increase accuracy (only possible for eQTL analysis)

simThreshold

Threshold until which the simulated power is taken instead of the analytic (only for the eQTL analysis)

speedPowerCalc

Option to speed power calculation by skipping all genes with an expression probability less than 0.01 (as overall power is anyway close to 0)

indepSNPs

Number of independent SNPs assumed for each loci (for eQTL Bonferroni multiple testing correction the number of tests are estimated as number expressed genes * indepSNPs)

ssize.ratio.de

In the DE case, ratio between sample size of group 0 (control group) and group 1 (1=balanced design)

reactionsPerKit

Reactions (=lanes) per kit, defines the total number of tested individuals per kit

Value

Data frame with overall detection power, power and expression power for each possible parameter combination given the budget and the parameter ranges


heiniglab/scPower documentation built on Jan. 9, 2025, 12:13 p.m.