| assign_mmseqs2 | R Documentation |
Use the MMseqs2 software to assign taxonomy to sequences.
The preferred usage is to provide a reference FASTA file in SINTAX format
via ref_fasta. The function builds a temporary MMseqs2 taxonomy database
from the SINTAX headers and then runs mmseqs easy-taxonomy with the
requested --lca-mode, giving the same LCA behaviour as the database
path.
Alternatively, a pre-built MMseqs2 database with NCBI taxonomy can be
passed via the database parameter (created via mmseqs createdb +
mmseqs createtaxdb, or downloaded with mmseqs databases). In this
case, the MMseqs2 native easy-taxonomy LCA workflow is used. See the
MMseqs2 wiki for details.
assign_mmseqs2(
physeq = NULL,
ref_fasta = NULL,
database = NULL,
seq2search = NULL,
mmseqs2path = find_mmseqs2(),
behavior = c("return_matrix", "add_to_phyloseq"),
suffix = "_mmseqs2",
lca_mode = 3,
lca_ranks = c("superkingdom", "phylum", "class", "order", "family", "genus", "species"),
column_names = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"),
search_type = 3,
sensitivity = NULL,
min_seq_id = NULL,
e_value = NULL,
max_accept = 5,
nproc = 1,
clean_pq = TRUE,
simplify_taxo = TRUE,
keep_temporary_files = FALSE,
verbose = FALSE,
cmd_args = ""
)
physeq |
(required) a |
ref_fasta |
Either a Biostrings::DNAStringSet object or a path
to a FASTA file in SINTAX format (taxonomy in headers after
|
database |
(optional) Path to a pre-built MMseqs2 database with
NCBI taxonomy information. Only used if |
seq2search |
(optional) A Biostrings::DNAStringSet object. Use
instead of |
mmseqs2path |
Path to the |
behavior |
Either
|
suffix |
(character) Suffix appended to new taxonomy column names
(default: |
lca_mode |
(integer) The LCA mode used by MMseqs2:
|
lca_ranks |
Character vector of NCBI taxonomy rank names passed
to |
column_names |
Character vector of output column names, must be
the same length as |
search_type |
(integer) MMseqs2 search type:
|
sensitivity |
(numeric, optional) Search sensitivity ( |
min_seq_id |
(numeric, optional) Minimum sequence identity
(0–1). If |
e_value |
(numeric, optional) Maximum E-value threshold ( |
max_accept |
(integer, optional) Maximum number of hits accepted per
query ( |
nproc |
(integer) Number of threads (default: 1). |
clean_pq |
(logical) Clean the phyloseq object before
searching? (default: |
simplify_taxo |
(logical) Apply |
keep_temporary_files |
(logical) Keep intermediate files
for debugging? (default: |
verbose |
(logical) Print progress messages? (default: |
cmd_args |
(character) Additional arguments appended to the MMseqs2 command. |
This function is mainly a wrapper of the work of others. Please cite MMseqs2: Mirdita M, Steinegger M, Breitwieser F, Soding J, Levy Karin E: Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics (2021).
If behavior == "return_matrix": a tibble with
columns taxa_names and one column per rank.
If behavior == "add_to_phyloseq": a new phyloseq object with
amended tax_table.
Adrien Taudière
assign_blastn(), assign_sintax(), assign_vsearch_lca()
## Not run:
ref_fasta <- Biostrings::readDNAStringSet(system.file("extdata",
"mini_UNITE_fungi.fasta.gz",
package = "MiscMetabar", mustWork = TRUE
))
# Preferred usage: provide a SINTAX-formatted FASTA file.
# The function searches with easy-search and parses SINTAX headers.
res <- assign_mmseqs2(data_fungi_mini, ref_fasta = ref_fasta)
head(res)
# Add taxonomy to phyloseq:
physeq_new <- assign_mmseqs2(
data_fungi_mini,
ref_fasta = ref_fasta,
behavior = "add_to_phyloseq"
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.