barcode_clean: DNA Barcode Clean

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/barcode_clean.R

Description

Takes an input fasta file and identifies genus level outliers and species outliers based on the 1.5 x greater than the interquartile range. It also, if selected, checks the sequence using amino acid translation and has the option to eliminate sequences that have non-IUPAC codes. Finally, the program calculates the barcode gap for the species in the submitted dataset.

Usage

1
barcode_clean(AA_code = "invert", AGCT_only = TRUE, data_folder = NULL)

Arguments

AA_code

This is the amino acid translation matrix (as implemented through ape) used to check the sequences for stop codons. The following codes are available std, vert, invert, F. The default is invert.

AGCT_only

This indicates if records with characters other than AGCT are kept, the default is TRUE. TRUE removes records with non-AGCT FALSE is accepting all IUPAC characters

data_folder

This variable can be used to provide a location for the MSA fasta files to be cleaned. The default value is set to NULL where the program will prompt the user to select the folder through point-and-click.

Details

Input: A file folder with one or more fasta files of interest

Value

Output: A single log file for the running of the function with the name A_Clean_File_YYYY-DD-TTTTTTTT. The function will also output three files for each fasta file submitted. The first is the distance matrix that was calculated and used to assess the DNA barcode gaps. This file is named the same as the input file with dist_table.dat appended to the end of the name. The second file is the total data table file which provides a table of all submitted records for each data set accompanied with the results from each section of the analysis. This file is named the same as the input fasta with data_table.dat appended to the end, Finally, a fasta file with all outliers and flagged records removed is generated for each input fasta file. This output file is named the same as the input fasta with no_outlier.fas appended to the end.

Author(s)

Robert G. Young

References

<https://github.com/rgyoung6/MACER> Young RG, Gill R, Gillis D, Hanner RH (2021) Molecular Acquisition, Cleaning and Evaluation in R (MACER) - A tool to assemble molecular marker datasets from BOLD and GenBank. Biodiversity Data Journal 9: e71378. <https://doi.org/10.3897/BDJ.9.e71378>

See Also

auto_seq_download() create_fastas() align_to_ref()

Examples

1
2
3
4
5
6
## Not run: 
barcode_clean(),
barcode_clean(AA_code = "vert", AGCT_only = TRUE),
barcode_clean(AA_code = "vert")

## End(Not run)

MACER documentation built on Sept. 8, 2021, 5:07 p.m.