ncbi_taxon_sample: Download representative sequences for a taxon
In metacoder: Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

ncbi_taxon_sample

R Documentation

Download representative sequences for a taxon

Description

Downloads a sample of sequences meant to evenly capture the diversity of a given taxon. Can be used to get a shallow sampling of vast groups. CAUTION: This function can make MANY queries to Genbank depending on arguments given and can take a very long time. Choose your arguments carefully to avoid long waits and needlessly stressing NCBI's servers. Use a downloaded database and a parser from the taxa package when possible.

Usage

ncbi_taxon_sample(
  name = NULL,
  id = NULL,
  target_rank,
  min_counts = NULL,
  max_counts = NULL,
  interpolate_min = TRUE,
  interpolate_max = TRUE,
  min_children = NULL,
  max_children = NULL,
  seqrange = "1:3000",
  getrelated = FALSE,
  fuzzy = TRUE,
  limit = 10,
  entrez_query = NULL,
  hypothetical = FALSE,
  verbose = TRUE
)

Arguments

`name`	(`character` of length 1) The taxon to download a sample of sequences for.
`id`	(`character` of length 1) The taxon id to download a sample of sequences for.
`target_rank`	(`character` of length 1) The finest taxonomic rank at which to sample. The finest rank at which replication occurs. Must be a finer rank than `taxon`.
`min_counts`	(named `numeric`) The minimum number of sequences to download for each taxonomic rank. The names correspond to taxonomic ranks.
`max_counts`	(named `numeric`) The maximum number of sequences to download for each taxonomic rank. The names correspond to taxonomic ranks.
`interpolate_min`	(`logical`) If `TRUE`, values supplied to `min_counts` and `min_children` will be used to infer the values of intermediate ranks not specified. Linear interpolation between values of specified ranks will be used to determine values of unspecified ranks.
`interpolate_max`	(`logical`) If `TRUE`, values supplied to `max_counts` and `max_children` will be used to infer the values of intermediate ranks not specified. Linear interpolation between values of specified ranks will be used to determine values of unspecified ranks.
`min_children`	(named `numeric`) The minimum number sub-taxa of taxa for a given rank must have for its sequences to be searched. The names correspond to taxonomic ranks.
`max_children`	(named `numeric`) The maximum number sub-taxa of taxa for a given rank must have for its sequences to be searched. The names correspond to taxonomic ranks.
`seqrange`	(character) Sequence range, as e.g., "1:1000". This is the range of sequence lengths to search for. So "1:1000" means search for sequences from 1 to 1000 characters in length.
`getrelated`	(logical) If TRUE, gets the longest sequences of a species in the same genus as the one searched for. If FALSE, returns nothing if no match found.
`fuzzy`	(logical) Whether to do fuzzy taxonomic ID search or exact search. If `TRUE`, we use `xXarbitraryXx[porgn:__txid<ID>]`, but if `FALSE`, we use `txid<ID>`. Default: `FALSE`
`limit`	(`numeric`) Number of sequences to search for and return. Max of 10,000. If you search for 6000 records, and only 5000 are found, you will of course only get 5000 back.
`entrez_query`	(`character`; length 1) An Entrez-format query to filter results with. This is useful to search for sequences with specific characteristics. The format is the same as the one used to seach genbank. (https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options)
`hypothetical`	(`logical`; length 1) If `FALSE`, an attempt will be made to not return hypothetical or predicted sequences judging from accession number prefixs (XM and XR). This can result in less than the `limit` being returned even if there are more sequences available, since this filtering is done after searching NCBI.
`verbose`	(`logical`) If `TRUE`, progress messages will be printed.

Examples



# Look up 5 ITS sequences from each fungal class
data <- ncbi_taxon_sample(name = "Fungi", target_rank = "class", limit = 5, 
                          entrez_query = '"internal transcribed spacer"[All Fields]')

# Look up taxonomic information for sequences
obj <- lookup_tax_data(data, type = "seq_id", column = "gi_no")

# Plot information
metacoder::filter_taxa(obj, taxon_names == "Fungi", subtaxa = TRUE) %>% 
  heat_tree(node_label = taxon_names, node_color = n_obs, node_size = n_obs)

metacoder documentation built on April 3, 2025, 8:39 p.m.

metacoder index

README.md Documentation for metacoder

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

metacoder
Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

ncbi_taxon_sample: Download representative sequences for a taxon
In metacoder: Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

Download representative sequences for a taxon

Description

Usage

Arguments

Examples

Related to ncbi_taxon_sample in metacoder...

R Package Documentation

Browse R Packages

We want your feedback!

metacoder Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

ncbi_taxon_sample: Download representative sequences for a taxon In metacoder: Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

Download representative sequences for a taxon

Description

Usage

Arguments

Examples

Related to ncbi_taxon_sample in metacoder...

R Package Documentation

Browse R Packages

We want your feedback!

metacoder
Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

ncbi_taxon_sample: Download representative sequences for a taxon
In metacoder: Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data