gbRecord: Read a GenBank/GenPept or Embl format file.

Description Usage Arguments Details Value Note See Also Examples

View source: R/gbRecord-class.R

Description

Import data from GenBank/GenPept, Embl, or IMGT/HLA flat files into R, represented as an instance of the gbRecord or gbRecordList classes.

Usage

1
gbRecord(rcd, progress = FALSE)

Arguments

rcd

A vector of paths to GenBank/Embl format records, an efetch object containing GenBank record(s), or a textConnection to a character vector that can be parsed as a Genbank or Embl record.

progress

Print a nice progress bar if parsing multiple Genbank records. (This will not work if you process the records in parallel.)

Details

For a sample GenBank record see https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html, for a detailed description of the GenBank feature table format see https://www.ncbi.nlm.nih.gov/collab/FT/.

For a description of the EMBL flat file format see ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/usrman.txt.

For a description of the format and conventions of IMGT/HLA flat files see https://www.ebi.ac.uk/ipd/imgt/hla/docs/manual.html.

Value

An instance of the gbRecord or gbRecordList classes.

Note

The gbRecord class is modelled after the Genbank flat file format. Both Embl and IMGT/HLA files do not fit this model perfectly, so some pretty arbitrary choices were made to make the data from these files fitr the model.

See Also

genomeRecordFromNCBI

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## Not run: 
### import from file
gbk_file <- system.file("extdata", "marine_metagenome.gb", package = "biofiles")
x <- gbRecord(gbk_file)

## End(Not run)

load(system.file("extdata", "marine_metagenome.rda", package = "biofiles"))
getHeader(x)
getFeatures(x)

### quickly extract features as GRanges
ranges(x["CDS"], include = c("product", "note", "protein_id"))

## Directly subset features
x[[1]]

### import directly from NCBI
## Not run: 
x <- gbRecord(reutils::efetch("139189709", "protein", rettype = "gp", retmode = "text"))
x

## End(Not run)

## import a file containing multiple GenBank records as a
## gbRecordList. With many short records it pays of to
## run the parsing in parallel
## Not run: 
gss_file <- system.file("extdata", "gss.gb", package = "biofiles")
library(doParallel)
registerDoParallel(cores = 4)
gss <- gbRecord(gss_file)
gss

## End(Not run)

gschofl/biofiles documentation built on Sept. 27, 2020, 12:08 a.m.