Download representative sequences for a taxon

Share:

Description

Downloads a sample of sequences meant to evenly capture the diversity of a given taxon. Can be used to get a shallow sampling of a vast groups. CAUTION: This function can make MANY queries to Genbank depending on arguments given and can take a very long time. Choose your arguments carefully to avoid long waits and needlessly stressing NCBI's servers. Use a downloaded database and extract_taxonomy when possible.

Usage

1
2
3
4
ncbi_taxon_sample(name = NULL, id = NULL, target_rank, min_counts = NULL,
  max_counts = NULL, interpolate_min = TRUE, interpolate_max = TRUE,
  min_length = 1, max_length = 10000, min_children = NULL,
  max_children = NULL, verbose = TRUE, ...)

Arguments

name

(character of length 1) The taxon to download a sample of sequences for.

id

(character of length 1) The taxon id to download a sample of sequences for.

target_rank

(character of length 1) The finest taxonomic rank at which to sample. The finest rank at which replication occurs. Must be a finer rank than taxon. Use get_taxonomy_levels to see available ranks.

min_counts

(named numeric) The minimum number of sequences to download for each taxonomic rank. The names correspond to taxonomic ranks.

max_counts

(named numeric) The maximum number of sequences to download for each taxonomic rank. The names correspond to taxonomic ranks.

interpolate_min

(logical) If TRUE, values supplied to min_counts and min_children will be used to infer the values of intermediate ranks not specified. Linear interpolation between values of spcified ranks will be used to determine values of unspecified ranks.

interpolate_max

(logical) If TRUE, values supplied to max_counts and max_children will be used to infer the values of intermediate ranks not specified. Linear interpolation between values of spcified ranks will be used to determine values of unspecified ranks.

min_length

(numeric of length 1) The minimum length of sequences that will be returned.

max_length

(numeric of length 1) The maximum length of sequences that will be returned.

min_children

(named numeric) The minimum number sub-taxa of taxa for a given rank must have for its sequences to be searched. The names correspond to taxonomic ranks.

max_children

(named numeric) The maximum number sub-taxa of taxa for a given rank must have for its sequences to be searched. The names correspond to taxonomic ranks.

verbose

(logical) If TRUE, progress messages will be printed.

...

Additional arguments are passed to ncbi_searcher.

Details

See get_taxonomy_levels for available taxonomic ranks.

Examples

1
2
3
4
5
6
7
8
## Not run: 
ncbi_taxon_sample(name = "oomycetes", target_rank = "genus")
data <- ncbi_taxon_sample(name = "fungi", target_rank = "phylum", 
                          max_counts = c(phylum = 30), 
                          entrez_query = "18S[All Fields] AND 28S[All Fields]",
                          min_length = 600, max_length = 10000)

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.