View source: R/fetch_metadata.R
fetch_metadata | R Documentation |
Sequences downloaded from GenBank with fetch_sequences
only include
the title and/or accession number. Use fetch_metadata
to obtain other
useful metadata associated with the sequences.
fetch_metadata(query, chunk_size = 10, max_tries = 10,
verbose = FALSE, higher_taxa = FALSE)
query |
String used to query NCBI GenBank. For more about the NCBI query format see https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options |
chunk_size |
Number of ids to use for each chunk. Changing this doesn't tend to affect the results, but lower values have more accurate progress bars. |
max_tries |
Maximum number of times to attempt the loop. |
verbose |
Logical; should information about number of loops attempted be printed to the screen |
higher_taxa |
Logical; should higher taxonomic ranks (family and order) be included in the results? |
entrez_search
is used to obtain a vector of IDs from the
'query', then entrez_summary
is used to download metadata
from the IDs. However, entrez_summary
will fail if too many
IDs are used as input (more than 200-300 or so). Therefore, fetch_metadata
splits the IDs into chunks (a list of vectors), and loops over the list.
Sometimes errors are encountered during the loop due to the API rejecting the request, internet connectivity, etc. To avoid this, the loop repeats until it finishes or the number of repeats reaches 'max_tries', upon which it quits with an error.
Tibble of metadata resulting from Genbank query. Columns include:
Genbank GI number
Genbank accession number
Taxon ID (can use to query with taxize)
Sequence title
Sequence length
Misc. data (specimen, collection country, etc), separated by |
Column names of misc. data, separated by |
Species name
## Not run:
fetch_metadata("rbcl[Gene] AND Crepidomanes[ORGN]")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.