dmPrecision: Estimate the precision parameter in the Dirichlet-multinomial...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Maximum likelihood estimates of the precision parameter in the Dirichlet-multinomial model used for the differential exon/transcript usage or QTL analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
dmPrecision(x, ...)

## S4 method for signature 'dmDSdata'
dmPrecision(x, design, mean_expression = TRUE,
  common_precision = TRUE, genewise_precision = TRUE, prec_adjust = TRUE,
  prec_subset = 0.1, prec_interval = c(0, 1000), prec_tol = 10,
  prec_init = 100, prec_grid_length = 21, prec_grid_range = c(-10, 10),
  prec_moderation = "trended", prec_prior_df = 0, prec_span = 0.1,
  one_way = TRUE, prop_mode = "constrOptim", prop_tol = 1e-12,
  coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
  BPPARAM = BiocParallel::SerialParam())

## S4 method for signature 'dmSQTLdata'
dmPrecision(x, mean_expression = TRUE,
  common_precision = TRUE, genewise_precision = TRUE, prec_adjust = TRUE,
  prec_subset = 0.1, prec_interval = c(0, 1000), prec_tol = 10,
  prec_init = 100, prec_grid_length = 21, prec_grid_range = c(-10, 10),
  prec_moderation = "none", prec_prior_df = 0, prec_span = 0.1,
  one_way = TRUE, speed = TRUE, prop_mode = "constrOptim",
  prop_tol = 1e-12, coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
  BPPARAM = BiocParallel::SerialParam())

Arguments

x

dmDSdata or dmSQTLdata object.

...

Other parameters that can be defined by methods using this generic.

design

Numeric matrix defining the model that should be used when estimating precision. Normally this should be a full model design used also in dmFit.

mean_expression

Logical. Whether to estimate the mean expression of genes.

common_precision

Logical. Whether to estimate the common precision.

genewise_precision

Logical. Whether to estimate the gene-wise precision.

prec_adjust

Logical. Whether to use the Cox-Reid adjusted or non-adjusted profile likelihood.

prec_subset

Value from 0 to 1 defining the percentage of genes used in common precision estimation. The default is 0.1, which uses 10 randomly selected genes to speed up the precision estimation process. Use set.seed function to make the analysis reproducible. See Examples.

prec_interval

Numeric vector of length 2 defining the interval of possible values for the common precision.

prec_tol

The desired accuracy when estimating common precision.

prec_init

Initial precision. If common_precision is TRUE, then prec_init is overwritten by common precision estimate.

prec_grid_length

Length of the search grid.

prec_grid_range

Vector giving the limits of grid interval.

prec_moderation

Precision moderation method. One can choose to shrink the precision estimates toward the common precision ("common") or toward the (precision versus mean expression) trend ("trended")

prec_prior_df

Degree of moderation (shrinkage) in case when it can not be calculated automaticaly (number of genes on the upper boundary of grid is smaller than 10). By default it is equal to 0.

prec_span

Value from 0 to 1 defining the percentage of genes used in smoothing sliding window when calculating the precision versus mean expression trend.

one_way

Logical. Should the shortcut fitting be used when the design corresponds to multiple group comparison. This is a similar approach as in edgeR. If TRUE (the default), then proportions are fitted per group and regression coefficients are recalculated from those fits.

prop_mode

Optimization method used to estimate proportions. Possible value "constrOptim".

prop_tol

The desired accuracy when estimating proportions.

coef_mode

Optimization method used to estimate regression coefficients. Possible value "optim".

coef_tol

The desired accuracy when estimating regression coefficients.

verbose

Numeric. Definie the level of progress messages displayed. 0 - no messages, 1 - main messages, 2 - message for every gene fitting.

BPPARAM

Parallelization method used by bplapply.

speed

Logical. If FALSE, precision is calculated per each gene-block. Such calculation may take a long time, since there can be hundreds of SNPs/blocks per gene. If TRUE, there will be only one precision calculated per gene and it will be assigned to all the blocks matched with this gene.

Details

Normally, in the differential analysis based on RNA-seq data, such as, for example, differential gene expression, dispersion (of negative-binomial model) is estimated. Here, we estimate precision of the Dirichlet-multinomial model as it is more convenient computationally. To obtain dispersion estimates, one can use a formula: dispersion = 1 / (1 + precision).

Parameters that are used in the precision (dispersion = 1 / (1 + precision)) estimation start with prefix prec_. Those that are used for the proportion estimation in each group when the shortcut fitting one_way = TRUE can be used start with prop_, and those that are used in the regression framework start with coef_.

There are two optimization methods implemented within dmPrecision: "optimize" for the common precision and "grid" for the gene-wise precision.

Only part of the precision parameters in dmPrecision have an influence on a given optimization method. Here is a list of such active parameters:

"optimize":

"grid", which uses the grid approach from estimateDisp in edgeR:

Value

Returns a dmDSprecision or dmSQTLprecision object.

Author(s)

Malgorzata Nowicka

References

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.

See Also

plotPrecision estimateDisp

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# --------------------------------------------------------------------------
# Create dmDSdata object 
# --------------------------------------------------------------------------
## Get kallisto transcript counts from the 'PasillaTranscriptExpr' package

library(PasillaTranscriptExpr)

data_dir  <- system.file("extdata", package = "PasillaTranscriptExpr")

## Load metadata
pasilla_metadata <- read.table(file.path(data_dir, "metadata.txt"), 
header = TRUE, as.is = TRUE)

## Load counts
pasilla_counts <- read.table(file.path(data_dir, "counts.txt"), 
header = TRUE, as.is = TRUE)

## Create a pasilla_samples data frame
pasilla_samples <- data.frame(sample_id = pasilla_metadata$SampleName, 
  group = pasilla_metadata$condition)
levels(pasilla_samples$group)

## Create a dmDSdata object
d <- dmDSdata(counts = pasilla_counts, samples = pasilla_samples)

## Use a subset of genes, which is defined in the following file
gene_id_subset <- readLines(file.path(data_dir, "gene_id_subset.txt"))

d <- d[names(d) %in% gene_id_subset, ]

# --------------------------------------------------------------------------
# Differential transcript usage analysis - simple two group comparison 
# --------------------------------------------------------------------------

## Filtering
## Check what is the minimal number of replicates per condition 
table(samples(d)$group)

d <- dmFilter(d, min_samps_gene_expr = 7, min_samps_feature_expr = 3,
  min_gene_expr = 10, min_feature_expr = 10)

plotData(d)

## Create the design matrix
design_full <- model.matrix(~ group, data = samples(d))

## To make the analysis reproducible
set.seed(123)
## Calculate precision
d <- dmPrecision(d, design = design_full)

plotPrecision(d)

head(mean_expression(d))
common_precision(d)
head(genewise_precision(d))

DRIMSeq documentation built on Nov. 17, 2017, 1:11 p.m.