assign_vsearch_lca | R Documentation |
Please cite Vsearch and stampa if you use this function to assign taxonomy.
If top_hits_only is TRUE, the algorithm is the one of stampa.
If top_hits_only is FALSE and vote_algorithm is NULL, you need to carefully define maxaccept
, id
and lca_cutoff
parameters.
The algorithm is internal to vsearch using the lcaout output.
If top_hits_only is FALSE and vote_algorithm is not NULL, conflict among the
list of taxonomic assignations is resolve using the function resolve_vector_ranks()
.
The possible values for vote_algorithm are "consensus", "rel_majority",
"abs_majority" and "unanimity". See resolve_vector_ranks()
for more details.
assign_vsearch_lca(
physeq = NULL,
ref_fasta = NULL,
seq2search = NULL,
behavior = c("return_matrix", "add_to_phyloseq", "return_cmd"),
vsearchpath = "vsearch",
clean_pq = TRUE,
taxa_ranks = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"),
nproc = 1,
suffix = "_sintax",
id = 0.5,
lca_cutoff = 1,
maxrejects = 32,
top_hits_only = TRUE,
maxaccepts = 0,
keep_temporary_files = FALSE,
verbose = TRUE,
temporary_fasta_file = "temp.fasta",
cmd_args = "",
too_few = "align_start",
vote_algorithm = NULL,
nb_voting = NULL,
strict = FALSE,
nb_agree_threshold = 1,
preference_index = NULL,
collapse_string = "/",
replace_collapsed_rank_by_NA = TRUE,
simplify_taxo = TRUE,
keep_vsearch_score = FALSE
)
physeq |
(required): a |
ref_fasta |
(required) A link to a database in vsearch format The reference database must contain taxonomic information in the header of each sequence in the form of a string starting with ";tax=" and followed by a comma-separated list of up to nine taxonomic identifiers. Each taxonomic identifier must start with an indication of the rank by one of the letters d (for domain) k (kingdom), p (phylum), c (class), o (order), f (family), g (genus), s (species), or t (strain). The letter is followed by a colon (:) and the name of that rank. Commas and semicolons are not allowed in the name of the rank. Non-ascii characters should be avoided in the names. Example: \>X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteriales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli,t:str._K-12_substr._MG1655 |
seq2search |
A DNAStringSet object of sequences to search for. Replace the physeq object. |
behavior |
Either "return_matrix" (default), "return_cmd", or "add_to_phyloseq":
|
vsearchpath |
(default: "vsearch") path to vsearch |
clean_pq |
(logical, default TRUE) If set to TRUE, empty samples and empty ASV are discarded before clustering. |
taxa_ranks |
A list with the name of the taxonomic rank present in ref_fasta |
nproc |
(int, default: 1) Set to number of cpus/processors to use |
suffix |
(character) The suffix to name the new columns. If set to "" (the default), the taxa_ranks algorithm is used without suffix. |
id |
(Float [0:1] default 0.5). Default value is based on
stampa.
See Vsearch Manual for parameter |
lca_cutoff |
(int, default 1). Fraction of matching hits
required for the last common ancestor (LCA) output. For example, a value
of 0.9 imply that if less than 10% of assigned species are not congruent
the taxonomy is filled.
Default value is based on stampa.
See Vsearch Manual for parameter Text from vsearch manual : "Adjust the fraction of matching hits required for the last common ancestor (LCA) output with the –lcaout option during searches. The default value is 1.0 which requires all hits to match at each taxonomic rank for that rank to be included. If a lower cutoff value is used, e.g. 0.95, a small fraction of non-matching hits are allowed while that rank will still be reported. The argument to this option must be larger than 0.5, but not larger than 1.0" |
maxrejects |
(int, default: 32)
Maximum number of non-matching target sequences to consider before
stopping the search for a given query.
Default value is based on stampa
See Vsearch Manual for parameter |
top_hits_only |
(Logical, default TRUE)
Only the top hits with an equally high percentage of identity between the query and
database sequence sets are written to the output. If you set top_hits_only
you may need to set a lower |
maxaccepts |
(int, default: 0)
Default value is based on stampa.
Maximum number of matching target sequences to accept before stopping the search
for a given query.
See Vsearch Manual for parameter |
keep_temporary_files |
(logical, default: FALSE) Do we keep temporary files?
|
verbose |
(logical). If TRUE, print additional information. |
temporary_fasta_file |
Name of the temporary fasta file. Only useful with keep_temporary_files = TRUE. |
cmd_args |
Additional arguments passed on to vsearch usearch_global cmd. |
too_few |
(default value "align_start") see |
vote_algorithm |
(default NULL) the method to vote among "consensus", "rel_majority",
"abs_majority" and "unanimity". See |
nb_voting |
(Int, default NULL). The number of taxa to keep before apply a vote to resolve conflict. If NULL all taxa passing the filters (min_id, min_bit_score, min_cover and min_e_value) are selected. |
strict |
(Logical, default FALSE). See |
nb_agree_threshold |
See |
preference_index |
See |
collapse_string |
See |
replace_collapsed_rank_by_NA |
(Logical, default TRUE) See |
simplify_taxo |
(logical default TRUE). Do we apply the
function |
keep_vsearch_score |
(Logical, default FALSE). If TRUE, the mean and sd of id score are stored in the tax_table. |
This function is mainly a wrapper of the work of others. Please cite vsearch and stampa
See param behavior
Adrien Taudière
assign_sintax()
, add_new_taxonomy_pq()
data_fungi_mini_new <- assign_vsearch_lca(data_fungi_mini,
ref_fasta = system.file("extdata", "mini_UNITE_fungi.fasta.gz", package = "MiscMetabar"),
lca_cutoff = 0.9, behavior = "add_to_phyloseq"
)
data_fungi_mini_new2 <- assign_vsearch_lca(data_fungi_mini,
ref_fasta = system.file("extdata", "mini_UNITE_fungi.fasta.gz", package = "MiscMetabar"),
id = 0.8, behavior = "add_to_phyloseq", top_hits_only = FALSE
)
data_fungi_mini_new3 <- assign_vsearch_lca(data_fungi_mini,
ref_fasta = system.file("extdata", "mini_UNITE_fungi.fasta.gz", package = "MiscMetabar"),
id = 0.5, behavior = "add_to_phyloseq", top_hits_only = FALSE, vote_algorithm = "rel_majority"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.