dmPrecision: Estimate the precision parameter in the Dirichlet-multinomial...
In DRIMSeq: Differential transcript usage and tuQTL analyses with Dirichlet-multinomial model in RNA-seq

Description Usage Arguments Details Value Author(s) References See Also Examples

Maximum likelihood estimates of the precision parameter in the Dirichlet-multinomial model used for the differential exon/transcript usage or QTL analysis.

dmPrecision(x, ...)

## S4 method for signature 'dmDSdata'
dmPrecision(x, design, mean_expression = TRUE,
  common_precision = TRUE, genewise_precision = TRUE, prec_adjust = TRUE,
  prec_subset = 0.1, prec_interval = c(0, 1000), prec_tol = 10,
  prec_init = 100, prec_grid_length = 21, prec_grid_range = c(-10, 10),
  prec_moderation = "trended", prec_prior_df = 0, prec_span = 0.1,
  one_way = TRUE, prop_mode = "constrOptim", prop_tol = 1e-12,
  coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
  add_uniform = FALSE, BPPARAM = BiocParallel::SerialParam())

## S4 method for signature 'dmSQTLdata'
dmPrecision(x, mean_expression = TRUE,
  common_precision = TRUE, genewise_precision = TRUE, prec_adjust = TRUE,
  prec_subset = 0.1, prec_interval = c(0, 1000), prec_tol = 10,
  prec_init = 100, prec_grid_length = 21, prec_grid_range = c(-10, 10),
  prec_moderation = "none", prec_prior_df = 0, prec_span = 0.1,
  one_way = TRUE, speed = TRUE, prop_mode = "constrOptim",
  prop_tol = 1e-12, coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
  BPPARAM = BiocParallel::SerialParam())

`x`	`dmDSdata` or `dmSQTLdata` object.
`...`	Other parameters that can be defined by methods using this generic.
`design`	Numeric matrix defining the model that should be used when estimating precision. Normally this should be a full model design used also in `dmFit`.
`mean_expression`	Logical. Whether to estimate the mean expression of genes.
`common_precision`	Logical. Whether to estimate the common precision.
`genewise_precision`	Logical. Whether to estimate the gene-wise precision.
`prec_adjust`	Logical. Whether to use the Cox-Reid adjusted or non-adjusted profile likelihood.
`prec_subset`	Value from 0 to 1 defining the percentage of genes used in common precision estimation. The default is 0.1, which uses 10 randomly selected genes to speed up the precision estimation process. Use `set.seed` function to make the analysis reproducible. See Examples.
`prec_interval`	Numeric vector of length 2 defining the interval of possible values for the common precision.
`prec_tol`	The desired accuracy when estimating common precision.
`prec_init`	Initial precision. If `common_precision` is `TRUE`, then `prec_init` is overwritten by common precision estimate.
`prec_grid_length`	Length of the search grid.
`prec_grid_range`	Vector giving the limits of grid interval.
`prec_moderation`	Precision moderation method. One can choose to shrink the precision estimates toward the common precision (`"common"`) or toward the (precision versus mean expression) trend (`"trended"`)
`prec_prior_df`	Degree of moderation (shrinkage) in case when it can not be calculated automaticaly (number of genes on the upper boundary of grid is smaller than 10). By default it is equal to 0.
`prec_span`	Value from 0 to 1 defining the percentage of genes used in smoothing sliding window when calculating the precision versus mean expression trend.
`one_way`	Logical. Should the shortcut fitting be used when the design corresponds to multiple group comparison. This is a similar approach as in `edgeR`. If `TRUE` (the default), then proportions are fitted per group and regression coefficients are recalculated from those fits.
`prop_mode`	Optimization method used to estimate proportions. Possible value `"constrOptim"`.
`prop_tol`	The desired accuracy when estimating proportions.
`coef_mode`	Optimization method used to estimate regression coefficients. Possible value `"optim"`.
`coef_tol`	The desired accuracy when estimating regression coefficients.
`verbose`	Numeric. Definie the level of progress messages displayed. 0 - no messages, 1 - main messages, 2 - message for every gene fitting.
`add_uniform`	Whether to add a small fractional count to zeros, (adding a uniform random variable between 0 and 0.1). This option allows for the fitting of genewise precision and coefficients for genes with two features having all zero for one group, or the last feature having all zero for one group.
`BPPARAM`	Parallelization method used by `bplapply`.
`speed`	Logical. If `FALSE`, precision is calculated per each gene-block. Such calculation may take a long time, since there can be hundreds of SNPs/blocks per gene. If `TRUE`, there will be only one precision calculated per gene and it will be assigned to all the blocks matched with this gene.

Normally, in the differential analysis based on RNA-seq data, such as, for example, differential gene expression, dispersion (of negative-binomial model) is estimated. Here, we estimate precision of the Dirichlet-multinomial model as it is more convenient computationally. To obtain dispersion estimates, one can use a formula: dispersion = 1 / (1 + precision).

Parameters that are used in the precision (dispersion = 1 / (1 + precision)) estimation start with prefix prec_. Those that are used for the proportion estimation in each group when the shortcut fitting one_way = TRUE can be used start with prop_, and those that are used in the regression framework start with coef_.

There are two optimization methods implemented within dmPrecision: "optimize" for the common precision and "grid" for the gene-wise precision.

Only part of the precision parameters in dmPrecision have an influence on a given optimization method. Here is a list of such active parameters:

"optimize":

prec_interval: Passed as interval.
prec_tol: The accuracy defined as tol.

"grid", which uses the grid approach from estimateDisp in edgeR:

prec_init, prec_grid_length, prec_grid_range: Parameters used to construct the search grid prec_init * 2^seq(from = prec_grid_range[1], to = prec_grid_range[2], length = prec_grid_length).
prec_moderation: Dipsersion shrinkage is available only with "grid" method.
prec_prior_df: Used only when precision shrinkage is activated. Moderated likelihood is equal to loglik + prec_prior_df * moderation. Higher prec_prior_df, more shrinkage toward common or trended precision is applied.
prec_span: Used only when precision moderation toward trend is activated.

Returns a dmDSprecision or dmSQTLprecision object.

Malgorzata Nowicka

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.

plotPrecision estimateDisp

# --------------------------------------------------------------------------
# Create dmDSdata object 
# --------------------------------------------------------------------------
## Get kallisto transcript counts from the 'PasillaTranscriptExpr' package

library(PasillaTranscriptExpr)

data_dir  <- system.file("extdata", package = "PasillaTranscriptExpr")

## Load metadata
pasilla_metadata <- read.table(file.path(data_dir, "metadata.txt"), 
header = TRUE, as.is = TRUE)

## Load counts
pasilla_counts <- read.table(file.path(data_dir, "counts.txt"), 
header = TRUE, as.is = TRUE)

## Create a pasilla_samples data frame
pasilla_samples <- data.frame(sample_id = pasilla_metadata$SampleName, 
  group = pasilla_metadata$condition)
levels(pasilla_samples$group)

## Create a dmDSdata object
d <- dmDSdata(counts = pasilla_counts, samples = pasilla_samples)

## Use a subset of genes, which is defined in the following file
gene_id_subset <- readLines(file.path(data_dir, "gene_id_subset.txt"))

d <- d[names(d) %in% gene_id_subset, ]

# --------------------------------------------------------------------------
# Differential transcript usage analysis - simple two group comparison 
# --------------------------------------------------------------------------

## Filtering
## Check what is the minimal number of replicates per condition 
table(samples(d)$group)

d <- dmFilter(d, min_samps_gene_expr = 7, min_samps_feature_expr = 3,
  min_gene_expr = 10, min_feature_expr = 10)

plotData(d)

## Create the design matrix
design_full <- model.matrix(~ group, data = samples(d))

## To make the analysis reproducible
set.seed(123)
## Calculate precision
d <- dmPrecision(d, design = design_full)

plotPrecision(d)

head(mean_expression(d))
common_precision(d)
head(genewise_precision(d))