ParametersEstimations: Estimation of the 'COTAN' model's parameters

ParametersEstimationsR Documentation

Estimation of the COTAN model's parameters

Description

These functions are used to estimate the COTAN model's parameters. That is the average count for each gene (lambda) the average count for each cell (nu) and the dispersion parameter for each gene to match the probability of zero.

The estimator methods are named Linear if they can be calculated as a linear statistic of the raw data or Bisection if they are found via a parallel bisection solver.

Usage

## S4 method for signature 'COTAN'
estimateLambdaLinear(objCOTAN)

## S4 method for signature 'COTAN'
estimateNuLinear(objCOTAN)

## S4 method for signature 'COTAN'
estimateDispersionBisection(
  objCOTAN,
  threshold = 0.001,
  cores = 1L,
  maxIterations = 100L,
  chunkSize = 1024L
)

## S4 method for signature 'COTAN'
estimateNuBisection(
  objCOTAN,
  threshold = 0.001,
  cores = 1L,
  maxIterations = 100L,
  chunkSize = 1024L
)

## S4 method for signature 'COTAN'
estimateDispersionNuBisection(
  objCOTAN,
  threshold = 0.001,
  cores = 1L,
  maxIterations = 100L,
  chunkSize = 1024L,
  enforceNuAverageToOne = TRUE
)

## S4 method for signature 'COTAN'
estimateDispersionNuNlminb(
  objCOTAN,
  threshold = 0.001,
  maxIterations = 50L,
  chunkSize = 1024L,
  enforceNuAverageToOne = TRUE
)

## S4 method for signature 'COTAN'
getNu(objCOTAN)

## S4 method for signature 'COTAN'
getLambda(objCOTAN)

## S4 method for signature 'COTAN'
getDispersion(objCOTAN)

estimatorsAreReady(objCOTAN)

getNuNormData(objCOTAN)

getLogNormData(objCOTAN)

getNormalizedData(objCOTAN, retLog = FALSE)

getProbabilityOfZero(objCOTAN)

Arguments

objCOTAN

a COTAN object

threshold

minimal solution precision

cores

number of cores to use. Default is 1.

maxIterations

max number of iterations (avoids infinite loops)

chunkSize

number of genes to solve in batch in a single core. Default is 1024.

enforceNuAverageToOne

a Boolean on whether to keep the average nu equal to 1

retLog

When TRUE calls getLogNormData(), calls getNuNormData()

Details

estimateLambdaLinear() does a linear estimation of lambda (genes' counts averages)

estimateNuLinear() does a linear estimation of nu (normalized cells' counts averages)

estimateDispersionBisection() estimates the negative binomial dispersion factor for each gene (a). Determines the dispersion such that, for each gene, the probability of zero count matches the number of observed zeros. It assumes estimateNuLinear() being already run.

estimateNuBisection() estimates the nu vector of a COTAN object by bisection. It determines the nu parameters such that, for each cell, the probability of zero counts matches the number of observed zeros. It assumes estimateDispersionBisection() being already run. Since this breaks the assumption that the average nu is one, it is recommended not to run this in isolation but use estimateDispersionNuBisection() instead.

estimateDispersionNuBisection() estimates the dispersion and nu field of a COTAN object by running sequentially a bisection for each parameter.

estimateDispersionNuNlminb() estimates the nu and dispersion parameters to minimize the discrepancy between the observed and expected probability of zero. It uses the stats::nlminb() solver, but since the joint parameters have too high dimensionality, it converges too slowly to be actually useful in real cases.

getNu() extracts the nu array (normalized cells' counts averages)

getLambda() extracts the lambda array (mean expression for each gene)

getDispersion() extracts the dispersion array

estimatorsAreReady() checks whether the estimators arrays lambda, nu, dispersion are available

getNuNormData() extracts the \nu-normalized count table (i.e. where each column is divided by nu) and returns it

getLogNormData() extracts the log-normalized count table (i.e. where each column is divided by the getCellsSize()), takes its log10 and returns it.

getNormalizedData() is deprecated: please use getNuNormData() or getLogNormData() directly as appropriate

getProbabilityOfZero() gives for each cell and each gene the probability of observing zero reads

Value

estimateLambdaLinear() returns the updated COTAN object

estimateNuLinear() returns the updated COTAN object

estimateDispersionBisection() returns the updated COTAN object

estimateNuBisection() returns the updated COTAN object

estimateDispersionNuBisection() returns the updated COTAN object

estimateDispersionNuNlminb() returns the updated COTAN object

getNu() returns the nu array

getLambda() returns the lambda array

getDispersion() returns the dispersion array

estimatorsAreReady() returns a boolean specifying whether all three arrays are non-empty

getNuNormData() returns the \nu-normalized count data.frame

getLogNormData() returns a data.frame after applying the formula \log_{10}{(10^4 * x + 1)} to the raw counts normalized by cells-size

getNormalizedData() returns a data.frame

getProbabilityOfZero() returns a data.frame with the probabilities of zero

Examples

data("test.dataset")
objCOTAN <- COTAN(raw = test.dataset)

objCOTAN <- estimateLambdaLinear(objCOTAN)
lambda <- getLambda(objCOTAN)

objCOTAN <- estimateNuLinear(objCOTAN)
nu <- getNu(objCOTAN)

objCOTAN <- estimateDispersionBisection(objCOTAN, cores = 6L)
dispersion <- getDispersion(objCOTAN)

objCOTAN <- estimateDispersionNuBisection(objCOTAN, cores = 6L,
                                          enforceNuAverageToOne = TRUE)
nu <- getNu(objCOTAN)
dispersion <- getDispersion(objCOTAN)

nuNorm <- getNuNormData(objCOTAN)

logNorm <- getLogNormData(objCOTAN)

logNorm <- getNormalizedData(objCOTAN, retLog = TRUE)

probZero <- getProbabilityOfZero(objCOTAN)


seriph78/COTAN documentation built on Dec. 10, 2024, 3:30 a.m.