proks: Prokaryotic genomes at NCBI
In cstubben/genomes2: Genome sequencing project metadata

Description Usage Format Details Source Examples

Prokaryotic genome sequencing projects at NCBI.

1	data(proks)

A genomes data frame with observations on the following 23 variables.

pid: BioProject id
name: Organism name
status: Sequencing status
released: First public sequence release
taxid: Taxonomy id
acc: BioProject Accession number
group: Phylum
subgroup: Class level
size: Total length of DNA (Mb)
gc: Percent GC (guanine or cytosine)
refseq: Refseq chromosome sequence accessions
insdc: GenBank chromosome sequence accessions
plasmid.refseq: Refseq plasmid sequence accessions
plasmid.insdc: GenBank plasmid sequence accessions
wgs: Four-letter WGS Accession prefix followed by version
scaffolds: Number of scaffolds/contigs
genes: Number of genes
proteins: Number of proteins
modified: Last modification date
center: Sequencing center
biosample: BioSample Accession number
assembly: Assembly Accession number
reference: Reference or representative genome

BioProject IDs are no longer unique and the table was modified on Nov 1, 2013 to include BioSample and Assembly accessions. See email on NCBI announcement regarding bacterial strain-level TaxID management for details

downloaded from ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt

data(proks)
proks
#single row 
t(proks[1,])
class(proks)
attributes(proks)[c("date","url")] 
summary(proks)
## check for missing release dates
table2(proks$status,!is.na(proks$wgs), dnn=list("Status", "Has WGS acc?"))
plot(proks)
plotby(proks, log='y', las=1, top=2)
hist(proks$size[proks$size<15], br=50, main="", col="blue", xlab="Size (Mb)")

## download recent table from NCBI
## Not run: update(proks)