GenPept Flat Files

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

biofiles - an interface to GenBank/GenPept files in R

biofiles provides interfacing to GenBank/GenPept or Embl flat file records. It includes utilities for reading and writing GenBank files, and methods for interacting with annotation and sequence data.

Installation

Install the latest stable release of the biofiles package from CRAN:

install.packages("biofiles")

Install the development version from github using the devtools package.

# install.packages("devtools")
devtools::install_github("gschofl/biofiles")

Basic functionality

Let's download a small bacterial genome, Chlamydophila psittaci DC15 (GenBank acc. CP002806; GI 334693792) from NCBI.

# install.packages("reutils")
gb_file <- reutils::efetch("CP002806", "nuccore", rettype = "gbwithparts", retmode = "text")
gb_file

Next, we parse the efetch object into a gbRecord instance.

rec <- biofiles::gbRecord(gb_file)
rec

The summary function provides an overview over the object:

biofiles::summary(rec)

Various getter methods provide access to the data contained in a GenBank record; for instance:

biofiles::getAccession(rec)

biofiles::getGeneID(rec)

biofiles::getDefinition(rec)

biofiles::getOrganism(rec)

biofiles::getSequence(rec)

biofiles::getReference(rec)

The function uniqueQualifs() provides an overview over the feature qualifiers used in a record:

biofiles::uniqueQualifs(rec)

Genbank Features

The important part of a GenBank record will generally be the list of annotions or features.

We can access the gbFeatureList of a gbRecord using getFeatures() or ft():

f <- biofiles::ft(rec)
f

We can extract features either by numeric subsetting:

f[[1]]

or we can subset by feature key:

f["CDS"]

A more versatile method to narrow down the list of features of interest is the function filter(). For instance, we can filter for all coding sequences (CDS) with the annotation "hypothetical" in the product qualifiers:

hypo <- biofiles::filter(rec, key = "CDS", product = "hypothetical")
biofiles::summary(hypo)

or we can filter for all elongation factors,

elong <- biofiles::filter(rec, key = "CDS", product = "elongation factor")
biofiles::summary(elong)

now let's extract the sequence for all elongation factors, and using the tools from the Biostrings packages, translate them into protein sequences. Note, that in order to do so, we first get the gbFeatureTable from the gbRecord, as otherwise we'd just extract the complete sequence associated with the GenBank record.

dna <- biofiles::getSequence(biofiles::ft(elong))
dna

str <- biofiles::strand(elong)
dna_revcomp <- c(Biostrings::reverseComplement(dna[str == -1]), dna[str == 1])
aa <- Biostrings::translate(dna_revcomp)
names(aa) <- names(dna_revcomp)
aa

We can use the ranges() method to extract GRanges objects defined in the Bioconductor package GenomicRanges:

elong_ranges <- biofiles::ranges(elong, include = c("locus_tag", "protein_id", "product"))
elong_ranges

gschofl/biofiles documentation built on Sept. 27, 2020, 12:08 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gschofl/biofiles
An Interface for GenBank/GenPept Flat Files

In gschofl/biofiles: An Interface for GenBank/GenPept Flat Files

biofiles - an interface to GenBank/GenPept files in R

Installation

Basic functionality

Genbank Features

R Package Documentation

Browse R Packages

We want your feedback!

gschofl/biofiles An Interface for GenBank/GenPept Flat Files

In gschofl/biofiles: An Interface for GenBank/GenPept Flat Files

biofiles - an interface to GenBank/GenPept files in R

Installation

Basic functionality

Genbank Features

R Package Documentation

Browse R Packages

We want your feedback!

gschofl/biofiles
An Interface for GenBank/GenPept Flat Files