proks: Prokaryotic genomes at NCBI

Description Usage Format Details Source Examples

Description

Prokaryotic genome sequencing projects at NCBI.

Usage

1

Format

A genomes data frame with observations on the following 23 variables.

pid

BioProject id

name

Organism name

status

Sequencing status

released

First public sequence release

taxid

Taxonomy id

acc

BioProject Accession number

group

Phylum

subgroup

Class level

size

Total length of DNA (Mb)

gc

Percent GC (guanine or cytosine)

refseq

Refseq chromosome sequence accessions

insdc

GenBank chromosome sequence accessions

plasmid.refseq

Refseq plasmid sequence accessions

plasmid.insdc

GenBank plasmid sequence accessions

wgs

Four-letter WGS Accession prefix followed by version

scaffolds

Number of scaffolds/contigs

genes

Number of genes

proteins

Number of proteins

modified

Last modification date

center

Sequencing center

biosample

BioSample Accession number

assembly

Assembly Accession number

reference

Reference or representative genome

Details

BioProject IDs are no longer unique and the table was modified on Nov 1, 2013 to include BioSample and Assembly accessions. See email on NCBI announcement regarding bacterial strain-level TaxID management for details

Source

downloaded from ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
data(proks)
proks
#single row 
t(proks[1,])
class(proks)
attributes(proks)[c("date","url")] 
summary(proks)
## check for missing release dates
table2(proks$status,!is.na(proks$wgs), dnn=list("Status", "Has WGS acc?"))
plot(proks)
plotby(proks, log='y', las=1, top=2)
hist(proks$size[proks$size<15], br=50, main="", col="blue", xlab="Size (Mb)")

## download recent table from NCBI
## Not run: update(proks) 

cstubben/genomes2 documentation built on May 12, 2017, 1:19 p.m.