View source: R/diamond_protein_to_protein_best_reciprocal_hit.R
diamond_protein_to_protein_best_reciprocal_hits | R Documentation |
This function performs a DIAMOND search (best reciprocal hit) of a given set of protein sequences against a given database.
diamond_protein_to_protein_best_reciprocal_hits(
query,
subject,
is_subject_db = FALSE,
format = "fasta",
sensitivity_mode = "ultra-sensitive",
out_format = "csv",
evalue = "1E-5",
max_target_seqs = 5000,
cores = 1,
hard_mask = TRUE,
diamond_exec_path = NULL,
add_makedb_options = NULL,
add_diamond_options = NULL,
output_path = NULL
)
query |
a character string specifying the path to the protein sequence file of interest (query organism). |
subject |
a character string specifying the path to the protein sequence file of interest (subject organism). |
is_subject_db |
logical specifying whether or not the |
format |
a character string specifying the file format of the sequence file, e.g. |
sensitivity_mode |
specify the level of alignment sensitivity. The higher the sensitivity level, the more deep homologs can be found, but at the cost of reduced computational speed.
|
out_format |
a character string specifying the format of the file in which the DIAMOND results shall be stored. Available options are:
|
evalue |
Expectation value (E) threshold for saving hits (default: |
max_target_seqs |
maximum number of aligned sequences that shall be retained. Please be aware that |
cores |
number of cores for parallel DIAMOND searches. |
hard_mask |
shall low complexity regions be hard masked with TANTAN? Default is |
diamond_exec_path |
a path to the DIAMOND executable or |
add_makedb_options |
a character string specifying additional makedb options that shall be passed on to the diamond makedb command line call, e.g. |
add_diamond_options |
a character string specifying additional diamond options that shall be passed on to the diamond command line call, e.g. |
output_path |
a path to the location were the DIAMOND best hit output shall be stored. E.g. |
Given a set of protein sequences (query sequences), a best hit diamond search (DRBH) is being performed.
A tibble as returned by the diamond_protein_to_protein_best_reciprocal_hits
function, storing the query_ids
in the first column and the subject_ids
(reciprocal best hit homologs) in the second column.
Hajk-Georg Drost
diamond_protein_to_protein_best_hits
, diamond_protein_to_protein
## Not run:
# performing homology inference using the diamond best reciprocal hit (DRBH) method using protein sequences
best_rec_hits <- diamond_protein_to_protein_best_reciprocal_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'rdiamond'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'rdiamond'))
# look at results
best_rec_hits
# store the DIAMOND output file to the current working directory
best_rec_hits <- diamond_protein_to_protein_best_reciprocal_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'rdiamond'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'rdiamond'),
output_path = getwd())
# look at results
best_rec_hits
# run diamond_protein_to_protein_best_reciprocal_hits() with multiple cores
best_rec_hits <- diamond_protein_to_protein_best_reciprocal_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'rdiamond'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'rdiamond'),
cores = 2)
# look at results
best_rec_hits
# performing homology inference using the diamond best hit (DRBH) method and
# specifying the path to the DIAMOND executable (here miniconda path)
best_rec_hits <- diamond_protein_to_protein_best_reciprocal_hits(
query = system.file('seqs/ortho_thal_aa.fasta', package = 'rdiamond'),
subject = system.file('seqs/ortho_lyra_aa.fasta', package = 'rdiamond'),
diamond_exec_path = "/opt/miniconda3/bin/")
# look at results
best_rec_hits
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.