Description Usage Arguments Details Value Author(s) References See Also Examples
Maximum likelihood estimates of the precision parameter in the Dirichlet-multinomial model used for the differential exon/transcript usage or QTL analysis.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | dmPrecision(x, ...)
## S4 method for signature 'dmDSdata'
dmPrecision(x, design, mean_expression = TRUE,
common_precision = TRUE, genewise_precision = TRUE, prec_adjust = TRUE,
prec_subset = 0.1, prec_interval = c(0, 1000), prec_tol = 10,
prec_init = 100, prec_grid_length = 21, prec_grid_range = c(-10, 10),
prec_moderation = "trended", prec_prior_df = 0, prec_span = 0.1,
one_way = TRUE, prop_mode = "constrOptim", prop_tol = 1e-12,
coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
BPPARAM = BiocParallel::SerialParam())
## S4 method for signature 'dmSQTLdata'
dmPrecision(x, mean_expression = TRUE,
common_precision = TRUE, genewise_precision = TRUE, prec_adjust = TRUE,
prec_subset = 0.1, prec_interval = c(0, 1000), prec_tol = 10,
prec_init = 100, prec_grid_length = 21, prec_grid_range = c(-10, 10),
prec_moderation = "none", prec_prior_df = 0, prec_span = 0.1,
one_way = TRUE, speed = TRUE, prop_mode = "constrOptim",
prop_tol = 1e-12, coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
BPPARAM = BiocParallel::SerialParam())
|
x |
|
... |
Other parameters that can be defined by methods using this generic. |
design |
Numeric matrix defining the model that should be used when
estimating precision. Normally this should be a full model design used
also in |
mean_expression |
Logical. Whether to estimate the mean expression of genes. |
common_precision |
Logical. Whether to estimate the common precision. |
genewise_precision |
Logical. Whether to estimate the gene-wise precision. |
prec_adjust |
Logical. Whether to use the Cox-Reid adjusted or non-adjusted profile likelihood. |
prec_subset |
Value from 0 to 1 defining the percentage of genes used in
common precision estimation. The default is 0.1, which uses 10
randomly selected genes to speed up the precision estimation process. Use
|
prec_interval |
Numeric vector of length 2 defining the interval of possible values for the common precision. |
prec_tol |
The desired accuracy when estimating common precision. |
prec_init |
Initial precision. If |
prec_grid_length |
Length of the search grid. |
prec_grid_range |
Vector giving the limits of grid interval. |
prec_moderation |
Precision moderation method. One can choose to shrink
the precision estimates toward the common precision ( |
prec_prior_df |
Degree of moderation (shrinkage) in case when it can not be calculated automaticaly (number of genes on the upper boundary of grid is smaller than 10). By default it is equal to 0. |
prec_span |
Value from 0 to 1 defining the percentage of genes used in smoothing sliding window when calculating the precision versus mean expression trend. |
one_way |
Logical. Should the shortcut fitting be used when the design
corresponds to multiple group comparison. This is a similar approach as in
|
prop_mode |
Optimization method used to estimate proportions. Possible
value |
prop_tol |
The desired accuracy when estimating proportions. |
coef_mode |
Optimization method used to estimate regression
coefficients. Possible value |
coef_tol |
The desired accuracy when estimating regression coefficients. |
verbose |
Numeric. Definie the level of progress messages displayed. 0 - no messages, 1 - main messages, 2 - message for every gene fitting. |
BPPARAM |
Parallelization method used by
|
speed |
Logical. If |
Normally, in the differential analysis based on RNA-seq data, such as, for example, differential gene expression, dispersion (of negative-binomial model) is estimated. Here, we estimate precision of the Dirichlet-multinomial model as it is more convenient computationally. To obtain dispersion estimates, one can use a formula: dispersion = 1 / (1 + precision).
Parameters that are used in the precision (dispersion = 1 / (1 +
precision)) estimation start with prefix prec_
. Those that are used
for the proportion estimation in each group when the shortcut fitting
one_way = TRUE
can be used start with prop_
, and those that
are used in the regression framework start with coef_
.
There are two optimization methods implemented within dmPrecision:
"optimize"
for the common precision and "grid"
for the
gene-wise precision.
Only part of the precision parameters in dmPrecision have an influence on a given optimization method. Here is a list of such active parameters:
"optimize"
:
prec_interval
: Passed as interval
.
prec_tol
: The accuracy defined as tol
.
"grid"
, which uses the grid approach from
estimateDisp
in edgeR
:
prec_init
, prec_grid_length
,
prec_grid_range
: Parameters used to construct the search grid
prec_init * 2^seq(from = prec_grid_range[1]
, to =
prec_grid_range[2]
, length = prec_grid_length)
.
prec_moderation
: Dipsersion shrinkage is available only with
"grid"
method.
prec_prior_df
: Used only when precision
shrinkage is activated. Moderated likelihood is equal to loglik +
prec_prior_df * moderation
. Higher prec_prior_df
, more shrinkage
toward common or trended precision is applied.
prec_span
:
Used only when precision moderation toward trend is activated.
Returns a dmDSprecision
or
dmSQTLprecision
object.
Malgorzata Nowicka
McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | # --------------------------------------------------------------------------
# Create dmDSdata object
# --------------------------------------------------------------------------
## Get kallisto transcript counts from the 'PasillaTranscriptExpr' package
library(PasillaTranscriptExpr)
data_dir <- system.file("extdata", package = "PasillaTranscriptExpr")
## Load metadata
pasilla_metadata <- read.table(file.path(data_dir, "metadata.txt"),
header = TRUE, as.is = TRUE)
## Load counts
pasilla_counts <- read.table(file.path(data_dir, "counts.txt"),
header = TRUE, as.is = TRUE)
## Create a pasilla_samples data frame
pasilla_samples <- data.frame(sample_id = pasilla_metadata$SampleName,
group = pasilla_metadata$condition)
levels(pasilla_samples$group)
## Create a dmDSdata object
d <- dmDSdata(counts = pasilla_counts, samples = pasilla_samples)
## Use a subset of genes, which is defined in the following file
gene_id_subset <- readLines(file.path(data_dir, "gene_id_subset.txt"))
d <- d[names(d) %in% gene_id_subset, ]
# --------------------------------------------------------------------------
# Differential transcript usage analysis - simple two group comparison
# --------------------------------------------------------------------------
## Filtering
## Check what is the minimal number of replicates per condition
table(samples(d)$group)
d <- dmFilter(d, min_samps_gene_expr = 7, min_samps_feature_expr = 3,
min_gene_expr = 10, min_feature_expr = 10)
plotData(d)
## Create the design matrix
design_full <- model.matrix(~ group, data = samples(d))
## To make the analysis reproducible
set.seed(123)
## Calculate precision
d <- dmPrecision(d, design = design_full)
plotPrecision(d)
head(mean_expression(d))
common_precision(d)
head(genewise_precision(d))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.