View source: R/Dif_expression_Analysis.R
DTEG.analysis | R Documentation |
Expression analysis of 2 dimensions, usually Ribo-seq vs RNA-seq.
Using an equal reimplementation of the deltaTE algorithm (see reference).
Creates a total of 3 DESeq models (given x is the target.contrast argument)
(usually 'condition' column) and libraryType is RNA-seq and Ribo-seq):
** The 3 differential sub models **
1. Ribo-seq model : design = ~ x (differences between the x groups in Ribo-seq)
2. RNA-seq model: design = ~ x (differences between the x groups in RNA-seq)
3. TE model: design = ~ x + libraryType + libraryType:x (differences between the x and libraryType groups and the interaction between them)
You need at least 2 groups and 2 replicates per group. By default, the Ribo-seq counts will
be over CDS and RNA-seq counts over whole mRNAs, per transcript. See notes section below
for more details.
DTEG.analysis(
df.rfp,
df.rna,
output.dir = QCfolder(df.rfp),
target.contrast = design[1],
design = ORFik::design(df.rfp),
p.value = 0.05,
RFP_counts = countTable(df.rfp, "cds", type = "summarized"),
RNA_counts = countTable(df.rna, "mrna", type = "summarized"),
batch.effect = FALSE,
pairs = combn.pairs(unlist(df.rfp[, design])),
plot.title = "",
plot.ext = ".pdf",
width = 6,
height = 6,
dot.size = 0.4,
relative.name = paste0("DTEG_plot", plot.ext),
complex.categories = FALSE,
plot_to_console = TRUE
)
df.rfp |
a |
df.rna |
a |
output.dir |
character, default |
target.contrast |
a character vector, default |
design |
a character vector, default |
p.value |
a numeric, default 0.05 in interval (0,1). Defines adjusted p-value to be used as significance threshold for the result groups. I.e. for exclusive translation group significant subset for p.value = 0.05 means: TE$padj < 0.05 & Ribo$padj < 0.05 & RNA$padj > 0.05. |
RFP_counts |
a |
RNA_counts |
a SummarizedExperiment, default: countTable(df.rna, "mrna", type = "summarized"), all transcripts. Assign a subset if you don't want to analyze all genes. It is recommended to not subset, to give DESeq2 data for variance analysis. |
batch.effect |
logical, default TRUE. Makes replicate column of the experiment
part of the design. |
pairs |
list of character pairs, the experiment contrasts. Default:
|
plot.title |
title for plots, usually name of experiment etc |
plot.ext |
character, default: ".pdf". Alternatives: ".png" or ".jpg". |
width |
numeric, default 6 (in inches) |
height |
numeric, default 6 (in inches) |
dot.size |
numeric, default 0.4, size of point dots in plot. |
relative.name |
character, Default: |
complex.categories |
logical, default FALSE. Separate into more groups, see above for details. |
plot_to_console |
logical, default TRUE. Plot to console before returning, set to FALSE to save some run time. |
Log fold changes and p-values are created from a Walds test on the comparison contrast described bellow.
The RNA-seq and Ribo-seq LFC values are shrunken using DESeq2::lfcShrink(type = "normal"). Note
that the TE LFC values are not shrunken (as following specifications from deltaTE paper).
The adjusted p-values are created using DESEQ pAdjustMethod = "BH" (Benjamini-Hochberg correction).
All other DESEQ2 arguments are default.
Analysis is done between each possible
combination of levels in the target contrast If target contrast is condition column,
with factor levels: WT, mut1 and mut2 with 3 replicates each. You get comparison
of WT vs mut1, WT vs mut2 and mut1 vs mut2.
The respective result categories are defined through 4 main categories, first some intuition. The number of ribosomes (Ribo-seq) is significantly different between 2 contrast elements in the model if the relative counts is statistically higher/lower, for mRNA levels (RNA-seq) it is the same. So TE is then RFP / RNA which is basically how many ribosomes translated per mRNA in the sample, if contrast group 1 has TE of 2, it means 2 ribosomes per mrna fragment, while TE of 4 would be a doubling of 4 ribosomes per mRNA.
Mathematically the groups are defined by the p adjusted values as the
following (te.sign means na_safe(te.padj < p.value),
na_safe is a function where NA values are FALSE for '<=' test
and TRUE for '>' test),
we also use a helper function:
te.sign & rfp.sign & rna.sign, all_models_sign := TRUE.
** Signicant DTEG Classifications **
No change : None of the below categories
Translation (only RFP) : te.sign & rfp.sign & !rna.sign
Expression (only RNA) : !te.sign & !rfp.sign & rna.sign
mRNA abundance : all_models_sign & na_safe(te.lfc * rna.lfc, ">", 0)
Inverse (inverse mRNA abundance) : all_models_sign & te.lfc * rna.lfc, "<", 0)
Buffering (Stable protein output) : te.sign & !rfp.sign & rna.sign
Forwarded (diagonal bottom left to top right) : !te.sign & rfp.sign & rna.sign
If complex.categories is FALSE, then Expression, Inverse and forwarded are defined 'Buffering'. mRNA abundance is called"Intensified" in original article For code, of classification, run: View(ORFik:::DTEG_add_regulation_categories). Feel free to redefine the categories as you want them.
See Figure 1 in the reference article for a clear definition of the groups!
If you do not need isoform variants, subset to longest isoform per gene
either before or in the returned object (See examples).
If you do not have RNA-seq controls, you can still use DESeq on Ribo-seq alone.
The LFC values are shrunken by lfcShrink(type = "normal").
Remember that DESeq by default can not
do global change analysis, it can only find subsets with changes in LFC!
a data.table with columns: (contrast variable, gene id, regulation status, log fold changes, p.adjust values, mean counts, significant (as logical))
https://doi.org/10.1002/cpmb.108
Other DifferentialExpression:
DEG.plot.static()
,
DEG_model()
,
DTEG.plot()
,
te.table()
,
te_rna.plot()
## Simple example (use ORFik template, then split on Ribo and RNA)
df <- ORFik.template.experiment()
df.rfp <- df[df$libtype == "RFP",]
df.rna <- df[df$libtype == "RNA",]
design(df.rfp) # The experimental design, per libtype
design(df.rfp)[1] # Default target contrast
#dt <- DTEG.analysis(df.rfp, df.rna)
## If you want to use the pshifted libs for analysis:
#dt <- DTEG.analysis(df.rfp, df.rna,
# RFP_counts = countTable(df.rfp, region = "cds",
# type = "summarized", count.folder = "pshifted"))
## Restrict DTEGs by log fold change (LFC):
## subset to abs(LFC) < 1.5 for both rfp and rna
#dt[abs(rfp) < 1.5 & abs(rna) < 1.5, Regulation := "No change"]
## Only longest isoform per gene:
#tx_longest <- filterTranscripts(df.rfp, 0, 1, 0)
#dt <- dt[id %in% tx_longest,]
## Convert to gene id
#dt[, id := txNamesToGeneNames(id, df.rfp)]
## To get by gene symbol, use biomaRt conversion
## To flip directionality of contrast pair nr 2:
#design <- "condition"
#pairs <- combn.pairs(unlist(df.rfp[, design])
#pairs[[2]] <- rev(pars[[2]])
#dt <- DTEG.analysis(df.rfp, df.rna,
# RFP_counts = countTable(df.rfp, region = "cds",
# type = "summarized", count.folder = "pshifted"),
# pairs = pairs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.