View source: R/genome_assembly.R
get_genome_stats | R Documentation |
Get summary statistics for genomes on NCBI using the NCBI Datasets API
get_genome_stats(taxon = NULL, filters = NULL)
taxon |
Taxon for which summary statistics will be retrieved, either as a character scalar (e.g., "brassicaceae") or as a numeric scalar representing NCBI Taxonomy ID (e.g., 3700). |
filters |
(optional) A list of filters to use when querying the API
in the form of key-value pairs, with keys in list names and values in list
elements (e.g., |
Possible filters for the filters parameter can be accessed at https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/rest-api/#get-/genome/taxon/-taxons-/dataset_report.
A data frame with the following variables:
character, accession number.
character, data source.
numeric, NCBI Taxonomy ID.
character, species' scientific name.
character, species' common name.
character, species' ecotype.
character, species' strain.
character, species' isolate.
character, species' cultivar.
factor, assembly level ("Complete", "Chromosome", "Scaffold", or "Contig").
character, assembly status.
character, assembly name.
character, assembly type.
character, submission date (YYYY-MM-DD).
character, submitter name.
character, sequencing technology.
logical, indicator of wheter the genome is atypical.
character, RefSeq category.
numeric, number of chromosomes.
numeric, total sequence length.
numeric, ungapped sequence length.
numeric, number of contigs.
numeric, contig N50.
numeric, contig L50.
numeric, contig N50.
numeric, contig L50.
numeric, GC percentage (0-100).
character, name of annotation provider.
character, annotation release date (YYYY-MM-DD).
numeric, total number of genes.
numeric, number of protein-coding genes.
numeric, number of non-coding genes.
numeric, number of pseudogenes.
numeric, number of other genes.
numeric, ratio of the number of contigs to the number of chromosomes.
# Example 1: Search for A. thaliana genomes by tax ID
ex1 <- get_genome_stats(taxon = 3702)
# Example 2: Search for A. thaliana genomes by name
ex2 <- get_genome_stats(taxon = "Arabidopsis thaliana")
# Example 3: Search for chromosome-level Brassicaeae genomes
ex3 <- get_genome_stats(
taxon = "brassicaceae",
filters = list(filters.assembly_level = "chromosome")
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.