Detecting defective interfering RNAs in Influenza virus A and B sequencing data
This R code has been developed for the detection of candidate defective interfering RNAs in virus RNASeq data. Currently, it has been tested with samples from Influenza A and B viruses, but it could be adapted to work with other viruses. The important thing to remember here is that we are currently just detecting "internal deletion" DIs, and NOT copyback or snapback DIs. This will likely be added in a future software package based on this code. As well, support for stranded (paired-end) libraries is currently a work in progress, and thus it only supports unstranded data at this time.
After installing the package, we need to load it and set up some code to point to the GTF files that we wish to analyze.
library(digR)
basepath = "/home/matt/Mal04/Maltemp"
vRNAfilenames <- list.files(path = basepath,
recursive = TRUE, include.dirs = TRUE,
pattern = ".gtf")
```
If your samples were derived from viruses which have been passaged repeatedly, as is often the case when studying DIs, you'll need to provide a vector of passage numbers. If not, you can just provide a character vector of a length equal to the number of samples.
In the case that passage numbers are needed, you can do something like this to extract them from filenames:
```R
passagenums <- gsub("(.*)-([[:alpha:]][1-9]{1,2})(.*.fastq.output.*)", "\\2", vRNAfilenames)
passagenums <- toupper(passagenums)
In the case that they are not:
passagenums <- vector("character", length = length(vRNAfilenames))
Ok! We are ready to import data using the vector of GTF file names that we created. We'll do this with the importVRNAs function. The 'filematch' parameter makes it easy to select a subset of the .gtf files that you selected when creating 'vRNAfilenames'.
vRNAs <- importvRNAs(filematch = "sorted.bam.gtf", path = basepath, stranded = "no")
After doing this, we will obtain a CompressedGRangesList object that contains the "transcripts" and "exons" that make up the assembled RNAs in our samples. This GRangesList object is handy to have as it can be used externally for other type of analyses, such as input into another package.
Here, we can create a table from all of the GRanges objects, which makes it a little easier to work with. This is really an intermediate step, but might be useful for just quickly getting a look at the results from your samples. We'll add more metadata including sequences after this.
vRNATable <- makevRNAtable(vRNAlist = vRNAs, passagenums = passagenums)
In order to get sequence information into our final table, we need to have installed and load the appropriate BSgenome object, which for this demo is 'BSgenome.Mal04MA.AGS'. We can then obtain the sequences for all of our RNAs using the 'vRNAseqs' function. Note that because you can specify the BSgenome objec to use in the vRNAseqs function, you could have several different genomes loaded and use whichever you wish.
library(BSgenome.Mal04MA.AGS)
sequences <- vRNAseqs(vRNAlist = vRNAs, seqObj = "BSgenome.Mal04MA.AGS")
Finally, we can produce an output table from all of this data that provides useful information on the predicted RNA species in our samples. This includes both predicted single fragment full-length RNAs, 2 or more fragment internal deletion DIs, and single fragment truncated RNAs.
summaryTable <- makeDItable(vRNATable = vRNATable, sequences = sequences, outType = "DI")
The summaryTable can now be saved, and used for further processing or analysis.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.