Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/dimond_best_hit.R
This function performs a DIAMOND search (best hit) of a given set of protein sequences against a given database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | diamond_best_hits(
query,
subject,
is_subject_db = FALSE,
format = "fasta",
sensitivity_mode = "ultra-sensitive",
out_format = "csv",
evalue = "1E-5",
max_target_seqs = 5000,
cores = 1,
hard_mask = TRUE,
diamond_exec_path = NULL,
add_makedb_options = NULL,
add_diamond_options = NULL,
output_path = NULL
)
|
query |
a character string specifying the path to the protein sequence file of interest (query organism). |
subject |
a character string specifying the path to the protein sequence file of interest (subject organism). |
is_subject_db |
logical specifying whether or not the |
format |
a character string specifying the file format of the sequence file, e.g. |
sensitivity_mode |
specify the level of alignment sensitivity. The higher the sensitivity level, the more deep homologs can be found, but at the cost of reduced computational speed.
|
out_format |
a character string specifying the format of the file in which the DIAMOND results shall be stored. Available options are:
|
evalue |
Expectation value (E) threshold for saving hits (default: |
max_target_seqs |
maximum number of aligned sequences that shall be retained. Please be aware that |
cores |
number of cores for parallel DIAMOND searches. |
hard_mask |
shall low complexity regions be hard masked with TANTAN? Default is |
diamond_exec_path |
a path to the DIAMOND executable or |
add_makedb_options |
a character string specifying additional makedb options that shall be passed on to the diamond makedb command line call, e.g. |
add_diamond_options |
a character string specifying additional diamond options that shall be passed on to the diamond command line call, e.g. |
output_path |
a path to the location were the DIAMOND best hit output shall be stored. E.g. |
Given a set of protein sequences (query sequences), a best hit diamond search (DBH) is being performed.
A tibble as returned by the diamond_best_hits
function, storing the query_ids
in the first column and the subject_ids
(best hit homologs) in the second column.
Hajk-Georg Drost
diamond_reciprocal_best_hits
, diamond_protein_to_protein
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ## Not run:
# performing homology inference using the diamond best hit (DBH) method using protein sequences
best_hits <- diamond_best_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'homologr'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'homologr'),
seq_type = "protein")
# look at results
best_hits
# store the DIAMOND output file to the current working directory
best_hits <- diamond_best_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'homologr'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'homologr'),
seq_type = "protein",
output_path = getwd())
# look at results
best_hits
# run diamond_best_hits() with multiple cores
best_hits <- diamond_best_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'homologr'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'homologr'),
cores = 2)
# look at results
best_hits
# performing homology inference using the diamond best hit (DBH) method and
# specifying the path to the DIAMOND executable (here miniconda path)
best_hits <- diamond_best_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'orthologr'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'orthologr'),
diamond_exec_path = "/opt/miniconda3/bin/")
# look at results
best_hits
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.