View source: R/diamond_protein_to_protein.R
diamond_protein_to_protein | R Documentation |
Run protein to protein DIAMOND2 of reference sequences against a blast-able database or fasta file.
diamond_protein_to_protein(
query,
subject,
output_path = NULL,
is_subject_db = FALSE,
task = "blastp",
sensitivity_mode = "ultra-sensitive",
use_arrow_duckdb_connection = FALSE,
evalue = 0.001,
out_format = "csv",
cores = 1,
max_target_seqs = 500,
hard_mask = TRUE,
diamond_exec_path = NULL,
add_makedb_options = NULL,
add_diamond_options = NULL
)
query |
path to input file in fasta format. |
subject |
path to subject file in fasta format or blast-able database. |
output_path |
path to folder at which DIAMOND2 output table shall be stored.
Default is |
is_subject_db |
logical specifying whether or not the |
task |
protein search task option. Options are:
|
sensitivity_mode |
specify the level of alignment sensitivity. The higher the sensitivity level, the more deep homologs can be found, but at the cost of reduced computational speed.
|
use_arrow_duckdb_connection |
shall DIAMOND2 hit output table be transformed to an in-process (big data disk-processing) arrow connection to DuckDB? This is useful when the DIAMOND2 output table to too large to fit into memory. Default is |
evalue |
Expectation value (E) threshold for saving hits (default: |
out_format |
a character string specifying the format of the file in which the DIAMOND results shall be stored. Available options are:
|
cores |
number of cores for parallel DIAMOND searches. |
max_target_seqs |
maximum number of aligned sequences that shall be retained. Please be aware that |
hard_mask |
shall low complexity regions be hard masked with TANTAN? Default is |
diamond_exec_path |
a path to the DIAMOND executable or |
add_makedb_options |
a character string specifying additional makedb options that shall be passed on to the diamond makedb command line call, e.g. |
add_diamond_options |
a character string specifying additional diamond options that shall be passed on to the diamond command line call, e.g. |
Hajk-Georg Drost
## Not run:
# run diamond assuming that the diamond executable is available
# via the system path ('diamond_exec_path = NULL') and using
# sensitivity_mode = "ultra-sensitive"
diamond_example <- diamond_protein_to_protein(
query = system.file('seqs/qry_aa.fa', package = 'rdiamond'),
subject = system.file('seqs/sbj_aa.fa', package = 'rdiamond'),
sensitivity_mode = "ultra-sensitive",
output_path = tempdir(),
use_arrow_duckdb_connection = FALSE)
# look at DIAMOND results
diamond_example
# run diamond assuming that the diamond executable is available
# via the miniconda path ('diamond_exec_path = "/opt/miniconda3/bin/"')
# and using 2 cores as well as sensitivity_mode = "ultra-sensitive"
diamond_example_conda <- diamond_protein_to_protein(
query = system.file('seqs/qry_aa.fa', package = 'rdiamond'),
subject = system.file('seqs/sbj_aa.fa', package = 'rdiamond'),
sensitivity_mode = "ultra-sensitive", diamond_exec_path = "/opt/miniconda3/bin/",
output_path = tempdir(),
use_arrow_duckdb_connection = FALSE, cores = 2)
# look at DIAMOND results
diamond_example_conda
# run diamond assuming that the diamond executable is available
# via the system path ('diamond_exec_path = NULL') and using
# sensitivity_mode = "ultra-sensitive" and adding command line options:
# "--block-size 4.0 --compress 1 --no-self-hits"
diamond_example_ultra_sensitive_add_diamond_options <- diamond_protein_to_protein(
query = system.file('seqs/qry_aa.fa', package = 'rdiamond'),
subject = system.file('seqs/sbj_aa.fa', package = 'rdiamond'),
sensitivity_mode = "ultra-sensitive",
max_target_seqs = 500,
output_path = tempdir(),
use_arrow_duckdb_connection = FALSE,
add_diamond_options = "--block-size 4.0 --compress 1 --no-self-hits",
cores = 1
)
# look at DIAMOND results
diamond_example_ultra_sensitive_add_diamond_options
# run diamond assuming that the diamond executable is available
# via the system path ('diamond_exec_path = NULL') and using
# sensitivity_mode = "ultra-sensitive" and adding makedb command line options:
# "--taxonnames"
diamond_example_ultra_sensitive_add_makedb_options <- diamond_protein_to_protein(
query = system.file('seqs/qry_aa.fa', package = 'rdiamond'),
subject = system.file('seqs/sbj_aa.fa', package = 'rdiamond'),
sensitivity_mode = "ultra-sensitive",
max_target_seqs = 500,
output_path = tempdir(),
use_arrow_duckdb_connection = FALSE,
add_makedb_options = "--taxonnames",
cores = 1
)
# look at DIAMOND results
diamond_example_ultra_sensitive_add_makedb_options
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.