knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(floundeR)
The Genbank file format is a standard for the sharing and distribution of whole
genome sequences with their accompanying annotations. The Genbank
parser in
floundeR
may not be the most technically competent parser and have been
implemented with a couple of ambitions that are related to the analysis and
identification of genetic variants in microbial genomes. This will undoubtedly
change with time ...
There is an R6 object defined in floundeR
called GenbankGenome
. Let's
identify the packaged genbank file that corresponds to a Tuberculosis reference
genome and create the corresponding object.
TB_reference = flnDr("TB_H37Rv.gb.gz") tb <- GenbankGenome$new(TB_reference)
This has created a TB GenbankGenome
object. A number of functions are provided
by the object to access the annotation content of this genome. The fundamental
methods provided a briefly introduced below.
The CDS
features annotated in the Genbank
file have been collated as a
GenomicRanges
object. This can be accessed using
tb$list_cds()
or for atomic entries (I am interested in the gene identified as eis
we can
use the more targetted method method
tb$get_cds(feature_id="eis")
The genome sequence from the Genbank entry can be accessed using an
active binding
called sequence
.
tb$sequence
A clinical collaborator has provided me with some information on a variant that we should be checking in some whole genome sequencing studies. The variant is a C->T transition located 12 nucleotides upstream of the eis gene. I would like to sanity check the provided information and identify an absolute coordinate within the reference genome.
The same collaborator as above is also interested in a mutation that has been described as a peptide change from A->E at amino acid number three of the pncA protein.
tb$get_cds(feature_id="pncA")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.