RNAmodR: AlkAnilineSeq

BiocStyle::markdown(css.files = c('custom.css'))

Introduction

7-methyl guanosine (m7G), 3-methyl cytidine (m3C) and Dihydrouridine (D) are commonly found in rRNA and tRNA and can be detected classically by primer extension analysis. However, since the modifications do not interfere with Watson-Crick base pairing, a specific chemical treatment needs to be employed to cause strand breaks specifically at the modified positions. Initially, this involved a sodium borhydride treatment to create abasic sites and cleaving the RNA at abasic sites with aniline.

This classical protocol was converted to a high throughput sequencing method call AlkAnilineSeq and allows modified position be detected by an accumulation of 5'-ends at the N+1 position [@Marchand.2018]. It was found, that m3C is susceptible to this treatment, which allows m7G, m3C and D to be detected by the same method from the same data sets, since the identify of the unmodified nucleotide informs about the three modified nucleotides.

The ModAlkAnilineSeq class uses the the NormEnd5SequenceData class to store and aggregate data along the transcripts. The calculated scores follow the nomenclature of [@Marchand.2018] with the names scoreNC (default) and scoreSR.

suppressPackageStartupMessages({
  library(rtracklayer)
  library(GenomicRanges)
  library(RNAmodR.AlkAnilineSeq)
  library(RNAmodR.Data)
})
library(rtracklayer)
library(GenomicRanges)
library(RNAmodR.AlkAnilineSeq)
library(RNAmodR.Data)

Example workflow

The example workflow is limited to 18S rRNA and some tRNA from S.cerevisiae. As annotation data either a gff file or a TxDb object and for sequence data a fasta file or a BSgenome object can be used. The data is provided as bam files.

annotation <- GFF3File(RNAmodR.Data.example.AAS.gff3())
sequences <- RNAmodR.Data.example.AAS.fasta()
files <- list("wt" = c(treated = RNAmodR.Data.example.wt.1(),
                       treated = RNAmodR.Data.example.wt.2(),
                       treated = RNAmodR.Data.example.wt.3()),
              "Bud23del" = c(treated = RNAmodR.Data.example.bud23.1(),
                             treated = RNAmodR.Data.example.bud23.2()),
              "Trm8del" = c(treated = RNAmodR.Data.example.trm8.1(),
                            treated = RNAmodR.Data.example.trm8.2()))

The analysis is triggered by the construction of a ModSetAlkAnilineSeq object. Internally parallelization is used via the BiocParallel package, which would allow optimization depending on number/size of input files (number of samples, number of replicates, number of transcripts, etc).

msaas <- ModSetAlkAnilineSeq(files, annotation = annotation, sequences = sequences)
msaas

As expected the m7G1575 is missing from the Bud23del samples.

mod <- modifications(msaas)
lapply(mod,head, n = 2L)

Visualizing the results

As outlined in the RNAmodR package we can compare the samples using the plotCompareByCoord to prepare a heatmap. For this we select some position from the found modifications. In addition we prepare an alias table.

coord <- mod[[1L]]
alias <- data.frame(tx_id = c(1L,3L,5L,6L,7L,8L,10L,11L),
                    name = c("18S rRNA","tF(GAA)B","tG(GCC)B","tT(AGT)B",
                             "tQ(TTG)B","tC(GCA)B","tS(CGA)C","tV(AAC)E1"),
                    stringsAsFactors = FALSE)
plotCompareByCoord(msaas, coord, score = "scoreSR", alias = alias,
                   normalize = TRUE)
plotCompareByCoord(msaas, coord[1L], score = "scoreSR", alias = alias)

In addition, the aggregate data along the transcript visualized as well.

plotData(msaas, "1", from = 1550L, to = 1600L)

This includes raw data as well.

plotData(msaas[1L:2L], "1", from = 1550L, to = 1600L, showSequenceData = TRUE)

Session info

sessionInfo()

References



Try the RNAmodR.AlkAnilineSeq package in your browser

Any scripts or data that you put into this service are public.

RNAmodR.AlkAnilineSeq documentation built on Nov. 8, 2020, 5:52 p.m.