DTEG.analysis: Run differential TE analysis
In Roleren/ORFik: Open Reading Frames in Genomics

View source: R/Dif_expression_Analysis.R

DTEG.analysis

R Documentation

Run differential TE analysis

Description

Expression analysis of 2 dimensions, usually Ribo-seq vs RNA-seq.
Using an equal reimplementation of the deltaTE algorithm (see reference).
Creates a total of 3 DESeq models (given x is the target.contrast argument) (usually 'condition' column) and libraryType is RNA-seq and Ribo-seq):

** The 3 differential sub models **

1. Ribo-seq model : design = ~ x (differences between the x groups in Ribo-seq)
2. RNA-seq model: design = ~ x (differences between the x groups in RNA-seq)
3. TE model: design = ~ x + libraryType + libraryType:x (differences between the x and libraryType groups and the interaction between them)

You need at least 2 groups and 2 replicates per group. By default, the Ribo-seq counts will be over CDS and RNA-seq counts over whole mRNAs, per transcript. See notes section below for more details.

Usage

DTEG.analysis(
  df.rfp,
  df.rna,
  output.dir = QCfolder(df.rfp),
  target.contrast = design[1],
  design = ORFik::design(df.rfp),
  p.value = 0.05,
  RFP_counts = countTable(df.rfp, "cds", type = "summarized"),
  RNA_counts = countTable(df.rna, "mrna", type = "summarized"),
  batch.effect = FALSE,
  pairs = combn.pairs(unlist(df.rfp[, design])),
  plot.title = "",
  plot.ext = ".pdf",
  width = 6,
  height = 6,
  dot.size = 0.4,
  relative.name = paste0("DTEG_plot", plot.ext),
  complex.categories = FALSE,
  plot_to_console = TRUE
)

Arguments

`df.rfp`	a `experiment` of usually Ribo-seq or 80S from TCP-seq. (the numerator of the experiment, usually having a primary role)
`df.rna`	a `experiment` of usually RNA-seq. (the denominator of the experiment, usually having a normalizing function)
`output.dir`	character, default `QCfolder(df.rfp)`. output.dir directory to save plots, plot will be named "TE_between". If NULL, will not save.
`target.contrast`	a character vector, default `design[1]`. The column in the ORFik experiment that represent the comparison contrasts. By default: the first design factor of the full experimental design. This is the factor you will do the comparison on. DESeq will normalize the counts based on the full design, but the log fold change values will be based on this contrast only. It is usually the 'condition' column.
`design`	a character vector, default `design(df.rfp)`. The full experiment design. Which factors have more than 1 level. Example: stage column are all HEK293, so it can not be a design factor. The condition column has 2 possible values, WT and mutant, so it is a factor of the experiment. Replicates column is not part of design, that is inserted later with setting `batch.effect = TRUE`. Library type 'libtype' column, can also no be part of initial design, it is always added inside the function, after initial setup.
`p.value`	a numeric, default 0.05 in interval (0,1). Defines adjusted p-value to be used as significance threshold for the result groups. I.e. for exclusive translation group significant subset for p.value = 0.05 means: TE$padj < 0.05 & Ribo$padj < 0.05 & RNA$padj > 0.05.
`RFP_counts`	a `SummarizedExperiment`, default: `countTable(df.rfp, "cds", type = "summarized")`, unshifted libraries, all transcript CDSs. If you have pshifted reads and countTables, do: `countTable(df.rfp, "cds", type = "summarized", count.folder = "pshifted")` Assign a subset if you don't want to analyze all genes. It is recommended to not subset, to give DESeq2 data for variance analysis.
`RNA_counts`	a SummarizedExperiment, default: countTable(df.rna, "mrna", type = "summarized"), all transcripts. Assign a subset if you don't want to analyze all genes. It is recommended to not subset, to give DESeq2 data for variance analysis.
`batch.effect`	logical, default TRUE. Makes replicate column of the experiment part of the design. If you believe you might have batch effects, keep as TRUE. Batch effect usually means that you have a strong variance between biological replicates. Check out `pcaExperiment` and see if replicates cluster together more than the design factor, to verify if you need to set it to TRUE.
`pairs`	list of character pairs, the experiment contrasts. Default: `combn.pairs(unlist(df.rfp[, target.contrast])`
`plot.title`	title for plots, usually name of experiment etc
`plot.ext`	character, default: ".pdf". Alternatives: ".png" or ".jpg".
`width`	numeric, default 6 (in inches)
`height`	numeric, default 6 (in inches)
`dot.size`	numeric, default 0.4, size of point dots in plot.
`relative.name`	character, Default: `paste0("DTEG_plot", plot.ext)` Relative name of file to be saved in folder specified in output.dir. Change to .pdf if you want pdf file instead of png.
`complex.categories`	logical, default FALSE. Separate into more groups, see above for details.
`plot_to_console`	logical, default TRUE. Plot to console before returning, set to FALSE to save some run time.

Details

Log fold changes and p-values are created from a Walds test on the comparison contrast described bellow. The RNA-seq and Ribo-seq LFC values are shrunken using DESeq2::lfcShrink(type = "normal"). Note that the TE LFC values are not shrunken (as following specifications from deltaTE paper). The adjusted p-values are created using DESEQ pAdjustMethod = "BH" (Benjamini-Hochberg correction). All other DESEQ2 arguments are default.

Analysis is done between each possible combination of levels in the target contrast If target contrast is condition column, with factor levels: WT, mut1 and mut2 with 3 replicates each. You get comparison of WT vs mut1, WT vs mut2 and mut1 vs mut2.

The respective result categories are defined through 4 main categories, first some intuition. The number of ribosomes (Ribo-seq) is significantly different between 2 contrast elements in the model if the relative counts is statistically higher/lower, for mRNA levels (RNA-seq) it is the same. So TE is then RFP / RNA which is basically how many ribosomes translated per mRNA in the sample, if contrast group 1 has TE of 2, it means 2 ribosomes per mrna fragment, while TE of 4 would be a doubling of 4 ribosomes per mRNA.

Mathematically the groups are defined by the p adjusted values as the following (te.sign means na_safe(te.padj < p.value), na_safe is a function where NA values are FALSE for '<=' test and TRUE for '>' test), we also use a helper function: te.sign & rfp.sign & rna.sign, all_models_sign := TRUE.

** Signicant DTEG Classifications **

No change : None of the below categories
Translation (only RFP) : te.sign & rfp.sign & !rna.sign
Expression (only RNA) : !te.sign & !rfp.sign & rna.sign
mRNA abundance : all_models_sign & na_safe(te.lfc * rna.lfc, ">", 0)
Inverse (inverse mRNA abundance) : all_models_sign & te.lfc * rna.lfc, "<", 0)
Buffering (Stable protein output) : te.sign & !rfp.sign & rna.sign
Forwarded (diagonal bottom left to top right) : !te.sign & rfp.sign & rna.sign

If complex.categories is FALSE, then Expression, Inverse and forwarded are defined 'Buffering'. mRNA abundance is called"Intensified" in original article For code, of classification, run: View(ORFik:::DTEG_add_regulation_categories). Feel free to redefine the categories as you want them.

See Figure 1 in the reference article for a clear definition of the groups!
If you do not need isoform variants, subset to longest isoform per gene either before or in the returned object (See examples). If you do not have RNA-seq controls, you can still use DESeq on Ribo-seq alone.
The LFC values are shrunken by lfcShrink(type = "normal").

Remember that DESeq by default can not do global change analysis, it can only find subsets with changes in LFC!

Value

a data.table with columns: (contrast variable, gene id, regulation status, log fold changes, p.adjust values, mean counts, significant (as logical))

References

https://doi.org/10.1002/cpmb.108

Examples

## Simple example (use ORFik template, then split on Ribo and RNA)
df <- ORFik.template.experiment()
df.rfp <- df[df$libtype == "RFP",]
df.rna <- df[df$libtype == "RNA",]
design(df.rfp) # The experimental design, per libtype
design(df.rfp)[1] # Default target contrast
#dt <- DTEG.analysis(df.rfp, df.rna)
## If you want to use the pshifted libs for analysis:
#dt <- DTEG.analysis(df.rfp, df.rna,
#                    RFP_counts = countTable(df.rfp, region = "cds",
#                       type = "summarized", count.folder = "pshifted"))
## Restrict DTEGs by log fold change (LFC):
## subset to abs(LFC) < 1.5 for both rfp and rna
#dt[abs(rfp) < 1.5 & abs(rna) < 1.5, Regulation := "No change"]

## Only longest isoform per gene:
#tx_longest <- filterTranscripts(df.rfp, 0, 1, 0)
#dt <- dt[id %in% tx_longest,]
## Convert to gene id
#dt[, id := txNamesToGeneNames(id, df.rfp)]
## To get by gene symbol, use biomaRt conversion
## To flip directionality of contrast pair nr 2:
#design <- "condition"
#pairs <- combn.pairs(unlist(df.rfp[, design])
#pairs[[2]] <- rev(pars[[2]])
#dt <- DTEG.analysis(df.rfp, df.rna,
#                    RFP_counts = countTable(df.rfp, region = "cds",
#                       type = "summarized", count.folder = "pshifted"),
#                       pairs = pairs)

Roleren/ORFik documentation built on April 12, 2025, 5:31 a.m.