dmTest: Likelihood ratio test to detect differential transcript/exon...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

First, estimate the null Dirichlet-multinomial and beta-binomial model parameters and likelihoods using the null model design. Second, perform the gene-level (DM model) and feature-level (BB model) likelihood ratio tests. In the differential exon/transcript usage analysis, the null model is defined by the null design matrix. In the exon/transcript usage QTL analysis, null models are defined by a design with intercept only. Currently, beta-binomial model is implemented only in the differential usage analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
dmTest(x, ...)

## S4 method for signature 'dmDSfit'
dmTest(x, coef = NULL, design = NULL, contrast = NULL,
  one_way = TRUE, bb_model = TRUE, prop_mode = "constrOptim",
  prop_tol = 1e-12, coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
  BPPARAM = BiocParallel::SerialParam())

## S4 method for signature 'dmSQTLfit'
dmTest(x, permutation_mode = "all_genes",
  one_way = TRUE, prop_mode = "constrOptim", prop_tol = 1e-12,
  coef_mode = "optim", coef_tol = 1e-12, verbose = 0,
  BPPARAM = BiocParallel::SerialParam())

Arguments

x

dmDSfit or dmSQTLfit object.

...

Other parameters that can be defined by methods using this generic.

coef

Integer or character vector indicating which coefficients of the linear model are to be tested equal to zero. Values must indicate column numbers or column names of the design used in dmFit.

design

Numeric matrix defining the null model.

contrast

Numeric vector or matrix specifying one or more contrasts of the linear model coefficients to be tested equal to zero. For a matrix, number of rows (for a vector, its length) must equal to the number of columns of design used in dmFit.

one_way

Logical. Should the shortcut fitting be used when the design corresponds to multiple group comparison. This is a similar approach as in edgeR. If TRUE (the default), then proportions are fitted per group and regression coefficients are recalculated from those fits.

bb_model

Logical. Whether to perform the feature-level analysis using the beta-binomial model.

prop_mode

Optimization method used to estimate proportions. Possible value "constrOptim".

prop_tol

The desired accuracy when estimating proportions.

coef_mode

Optimization method used to estimate regression coefficients. Possible value "optim".

coef_tol

The desired accuracy when estimating regression coefficients.

verbose

Numeric. Definie the level of progress messages displayed. 0 - no messages, 1 - main messages, 2 - message for every gene fitting.

BPPARAM

Parallelization method used by bplapply.

permutation_mode

Character specifying which permutation scheme to apply for p-value calculation. When equal to "all_genes", null distribution of p-values is calculated from all genes and the maximum number of permutation cycles is 10. When permutation_mode = "per_gene", null distribution of p-values is calculated for each gene separately based on permutations of this individual gene. The latter approach may take a lot of computational time. We suggest using the first option.

Details

One must specify one of the arguments: coef, design or contrast.

When contrast is used to define the null model, the null design matrix is recalculated using the same approach as in glmLRT function from edgeR.

Value

Returns a dmDStest or dmSQTLtest object.

Author(s)

Malgorzata Nowicka

References

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297.

See Also

plotPValues glmLRT

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# --------------------------------------------------------------------------
# Create dmDSdata object 
# --------------------------------------------------------------------------
## Get kallisto transcript counts from the 'PasillaTranscriptExpr' package

library(PasillaTranscriptExpr)

data_dir  <- system.file("extdata", package = "PasillaTranscriptExpr")

## Load metadata
pasilla_metadata <- read.table(file.path(data_dir, "metadata.txt"), 
header = TRUE, as.is = TRUE)

## Load counts
pasilla_counts <- read.table(file.path(data_dir, "counts.txt"), 
header = TRUE, as.is = TRUE)

## Create a pasilla_samples data frame
pasilla_samples <- data.frame(sample_id = pasilla_metadata$SampleName, 
  group = pasilla_metadata$condition)
levels(pasilla_samples$group)

## Create a dmDSdata object
d <- dmDSdata(counts = pasilla_counts, samples = pasilla_samples)

## Use a subset of genes, which is defined in the following file
gene_id_subset <- readLines(file.path(data_dir, "gene_id_subset.txt"))

d <- d[names(d) %in% gene_id_subset, ]

# --------------------------------------------------------------------------
# Differential transcript usage analysis - simple two group comparison 
# --------------------------------------------------------------------------

## Filtering
## Check what is the minimal number of replicates per condition 
table(samples(d)$group)

d <- dmFilter(d, min_samps_gene_expr = 7, min_samps_feature_expr = 3,
  min_gene_expr = 10, min_feature_expr = 10)

plotData(d)

## Create the design matrix
design_full <- model.matrix(~ group, data = samples(d))

## To make the analysis reproducible
set.seed(123)
## Calculate precision
d <- dmPrecision(d, design = design_full)

plotPrecision(d)

head(mean_expression(d))
common_precision(d)
head(genewise_precision(d))

## Fit full model proportions
d <- dmFit(d, design = design_full)

## Get fitted proportions
head(proportions(d))
## Get the DM regression coefficients (gene-level) 
head(coefficients(d))
## Get the BB regression coefficients (feature-level) 
head(coefficients(d), level = "feature")

## Fit null model proportions and perform the LR test to detect DTU
d <- dmTest(d, coef = "groupKD")

## Plot the gene-level p-values
plotPValues(d)

## Get the gene-level results
head(results(d))

## Plot feature proportions for a top DTU gene
res <- results(d)
res <- res[order(res$pvalue, decreasing = FALSE), ]

top_gene_id <- res$gene_id[1]

plotProportions(d, gene_id = top_gene_id, group_variable = "group")

plotProportions(d, gene_id = top_gene_id, group_variable = "group", 
  plot_type = "lineplot")

plotProportions(d, gene_id = top_gene_id, group_variable = "group", 
  plot_type = "ribbonplot")

DRIMSeq documentation built on Nov. 8, 2020, 8:25 p.m.