auto_seq_download: Automatic Sequence Download

View source: R/auto_seq_download.R

auto_seq_downloadR Documentation

Automatic Sequence Download

Description

Takes a list of genera, as supplied by the user, and searches and downloads molecular sequence data from BOLD and Genbank.

Usage

auto_seq_download(
  BOLD_database = TRUE,
  NCBI_database = TRUE,
  search_str = NULL,
  input_file = NULL,
  output_file = NULL,
  seq_min = 100,
  seq_max = 2500
)

Arguments

BOLD_database

TRUE is to include, FALSE is to exclude; default TRUE

NCBI_database

TRUE is to include, FALSE is to exclude; default TRUE

search_str

NULL uses the default string, anything other than NULL then that string will be used for the GenBank search; default NULL. The Default String is: (genus[ORGN]) NOT (shotgun[ALL] OR genome[ALL] OR assembled[ALL] OR microsatellite[ALL])

input_file

NULL prompts the user to indicate the location of the input file through point and click prompts, anything other than NULL then the string supplied will be used for the location; default NULL

output_file

NULL prompts the user to indicate the location of the output file through point and click prompts, anything other than NULL then the string supplied will be used for the location; default NULL

seq_min

holds the minimum length value to not flag the sequence; default 100

seq_max

holds the maximum length value to not flag the sequence; default 2500

Details

User Input: A list of genera in a text file in a single column with a new line at the end of the list.

Value

Outputs: One main folder containing three other folders. Main folder - Seq_auto_dl_TTTTTT_MMM_DD Three subfolders: 1. BOLD - Contains a file for every genus downloaded with the raw data from the BOLD system. 2. NCBI - Contains a file for every genus downloaded with the raw data from GenBank. 3. Total_tables - Contains files for the running of the function which include... A_Summary.txt - This file contains information about the downloads. A_Total_Table.tsv - A file with a single table containing the accumulated data for all genera searched.

Note

When using a custom search string for NCBI only a single genus at a time can be used.

Author(s)

Robert G. Young

References

<https://github.com/rgyoung6/MACER> Young, R. G., Gill, R., Gillis, D., Hanner, R. H. (Submitted June 2021). Molecular Acquisition, Cleaning, and Evaluation in R (MACER) - A tool to assemble molecular marker datasets from BOLD and GenBank. Biodiversity Data Journal.

See Also

create_fastas() align_to_ref() barcode_clean()

Examples

## Not run: 
auto_seq_download()
auto_seq_download(BOLD_database = TRUE, NCBI_database = FALSE)
auto_seq_download(BOLD_database = FALSE, NCBI_database = TRUE)

## End(Not run)


MACER documentation built on Dec. 3, 2022, 1:10 a.m.