View source: R/format_reference_database.R
| format_reference_database | R Documentation |
Formats reference databases from MIDORI or UNITE for use with the local_taxa_tool function.
format_reference_database(
path_to_input_database,
path_to_output_database,
input_database_source = "MIDORI",
path_to_taxonomy_edits = NA,
path_to_sequence_edits = NA,
path_to_taxa_subset_list = NA,
makeblastdb_command = "makeblastdb",
...
)
path_to_input_database |
String specifying path to input reference database in FASTA format. |
path_to_output_database |
String specifying path to output BLAST database in FASTA format. File path cannot contain spaces. |
input_database_source |
String specifying input reference database source ( |
path_to_taxonomy_edits |
String specifying path to taxonomy edits file in CSV format. The file must contain the following fields: 'Old_Taxonomy', 'New_Taxonomy', 'Notes'. Old taxonomies are replaced with new taxonomies in the order the records appear in the file. The taxonomic levels in the 'Old_Taxonomy' and 'New_Taxonomy' fields should be delimited by a semi-colon. If no taxonomy edits are desired, then set this variable to |
path_to_sequence_edits |
String specifying path to sequence edits file in CSV format. The file must contain the following fields: 'Action', 'Common_Name', 'Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species', 'Sequence', 'Notes'. The values in the 'Action' field must be either 'Add' or 'Remove', which will add or remove the respective sequence from the reference database. Values in the 'Common_Name' field are optional. Values should be supplied to all taxonomy fields. If using a reference database from MIDORI, then use NCBI domain names (e.g., 'Eukaryota') in the 'Domain' field. If using a reference database from UNITE, then use kingdom names (e.g., 'Fungi') in the 'Domain' field. The 'Species' field should contain species binomials. Sequence edits are performed after taxonomy edits, if applied. If no sequence edits are desired, then set this variable to |
path_to_taxa_subset_list |
String specifying path to list of species (in CSV format) to subset the reference database to. This option is helpful if the user wants the reference database to include only the sequences of local species. The file should contain the following fields: 'Common_Name', 'Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species'. There should be no |
makeblastdb_command |
String specifying path to the makeblastdb program, which is a part of BLAST. The default ( |
... |
Accepts former argument names for backwards compatibility. |
No return value. Writes formatted BLAST database files.
local_taxa_tool for performing geographically-conscious taxonomic assignment.
adjust_taxonomies for adjusting a taxonomy system.
# Get path to example reference sequences FASTA file.
path_to_input_file<-system.file("extdata",
"example_reference_sequences.fasta",
package="LocaTT",
mustWork=TRUE)
# Create a temporary file path for the output reference database FASTA file.
path_to_output_file<-tempfile(fileext=".fasta")
# Format reference database.
format_reference_database(path_to_input_database=path_to_input_file,
path_to_output_database=path_to_output_file)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.