Description Usage Arguments Details Value Author(s) References See Also Examples
Obtain the maximum likelihood estimates of Dirichlet-multinomial (gene-level) and/or beta-binomial (feature-level) regression coefficients, feature proportions in each sample and corresponding likelihoods. In the differential exon/transcript usage analysis, the regression model is defined by a design matrix. In the exon/transcript usage QTL analysis, regression models are defined by genotypes. Currently, beta-binomial model is implemented only in the differential usage analysis.
1 2 3 4 5 6 7 8 9 10 11 | dmFit(x, ...)
## S4 method for signature 'dmDSprecision'
dmFit(x, design, one_way = TRUE, bb_model = TRUE,
prop_mode = "constrOptim", prop_tol = 1e-12, coef_mode = "optim",
coef_tol = 1e-12, verbose = 0, BPPARAM = BiocParallel::SerialParam())
## S4 method for signature 'dmSQTLprecision'
dmFit(x, one_way = TRUE,
prop_mode = "constrOptim", prop_tol = 1e-12, coef_mode = "optim",
coef_tol = 1e-12, verbose = 0, BPPARAM = BiocParallel::SerialParam())
|
x |
|
... |
Other parameters that can be defined by methods using this generic. |
design |
Numeric matrix defining the full model. |
one_way |
Logical. Should the shortcut fitting be used when the design
corresponds to multiple group comparison. This is a similar approach as in
|
bb_model |
Logical. Whether to perform the feature-level analysis using the beta-binomial model. |
prop_mode |
Optimization method used to estimate proportions. Possible
value |
prop_tol |
The desired accuracy when estimating proportions. |
coef_mode |
Optimization method used to estimate regression
coefficients. Possible value |
coef_tol |
The desired accuracy when estimating regression coefficients. |
verbose |
Numeric. Definie the level of progress messages displayed. 0 - no messages, 1 - main messages, 2 - message for every gene fitting. |
BPPARAM |
Parallelization method used by
|
In the regression framework here, we adapt the idea from
glmFit
in edgeR
about using a shortcut
algorithm when the design is equivalent to simple group fitting. In such a
case, we estimate the DM proportions for each group of samples separately
and then recalculate the DM (and/or the BB) regression coefficients
corresponding to the design matrix. If the design matrix does not define a
simple group fitting, for example, when it contains a column with
continuous values, then the regression framework is used to directly
estimate the regression coefficients.
Arguments that are used for the proportion estimation in each group when
the shortcut fitting can be used start with prop_
, and those that
are used in the regression framework start with coef_
.
In the differential transcript usage analysis, setting one_way =
TRUE
allows switching to the shortcut algorithm only if the design is
equivalent to simple group fitting. one_way = FALSE
forces usage of
the regression framework.
In the QTL analysis, currently, genotypes are defined as numeric
values 0, 1, and 2. When one_way = TRUE
, simple multiple group fitting
is performed. When one_way = FALSE
, a regression framework is used
with the design matrix defined by a formula ~ group
where group is a
continuous (not categorical) variable with values 0, 1, and 2.
Returns a dmDSfit
or
dmSQTLfit
object.
Malgorzata Nowicka
McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | # --------------------------------------------------------------------------
# Create dmDSdata object
# --------------------------------------------------------------------------
## Get kallisto transcript counts from the 'PasillaTranscriptExpr' package
library(PasillaTranscriptExpr)
data_dir <- system.file("extdata", package = "PasillaTranscriptExpr")
## Load metadata
pasilla_metadata <- read.table(file.path(data_dir, "metadata.txt"),
header = TRUE, as.is = TRUE)
## Load counts
pasilla_counts <- read.table(file.path(data_dir, "counts.txt"),
header = TRUE, as.is = TRUE)
## Create a pasilla_samples data frame
pasilla_samples <- data.frame(sample_id = pasilla_metadata$SampleName,
group = pasilla_metadata$condition)
levels(pasilla_samples$group)
## Create a dmDSdata object
d <- dmDSdata(counts = pasilla_counts, samples = pasilla_samples)
## Use a subset of genes, which is defined in the following file
gene_id_subset <- readLines(file.path(data_dir, "gene_id_subset.txt"))
d <- d[names(d) %in% gene_id_subset, ]
# --------------------------------------------------------------------------
# Differential transcript usage analysis - simple two group comparison
# --------------------------------------------------------------------------
## Filtering
## Check what is the minimal number of replicates per condition
table(samples(d)$group)
d <- dmFilter(d, min_samps_gene_expr = 7, min_samps_feature_expr = 3,
min_gene_expr = 10, min_feature_expr = 10)
plotData(d)
## Create the design matrix
design_full <- model.matrix(~ group, data = samples(d))
## To make the analysis reproducible
set.seed(123)
## Calculate precision
d <- dmPrecision(d, design = design_full)
plotPrecision(d)
head(mean_expression(d))
common_precision(d)
head(genewise_precision(d))
## Fit full model proportions
d <- dmFit(d, design = design_full)
## Get fitted proportions
head(proportions(d))
## Get the DM regression coefficients (gene-level)
head(coefficients(d))
## Get the BB regression coefficients (feature-level)
head(coefficients(d), level = "feature")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.