cdsdir2orthofinder: cdsdir2orthofinder

View source: R/cdsdir2orthofinder.R

cdsdir2orthofinderR Documentation

cdsdir2orthofinder

Description

This function calculates (conditional-)reciprocal best hit (CRBHit) pairs for all possible comparison including self comparison from a directory of AA fasta files. Sequence searches are performed with last Kiełbasa, SM et al. (2011) [default] or with mmseqs2 Steinegger, M and Soeding, J (2017) or with diamond Buchfink, B et al. (2021).

Usage

cdsdir2orthofinder(
  dir,
  file_ending = "*",
  searchtool = "last",
  lastpath = paste0(find.package("CRBHits"), "/extdata/last-1542/bin/"),
  lastD = 1e+06,
  mmseqs2path = NULL,
  mmseqs2sensitivity = 5.7,
  diamondpath = NULL,
  diamondsensitivity = "--sensitive",
  outpath = "/tmp",
  crbh = TRUE,
  keepSingleDirection = FALSE,
  eval = 0.001,
  qcov = 0,
  tcov = 0,
  pident = 0,
  alnlen = 0,
  rost1999 = FALSE,
  filter = NULL,
  fit.type = "mean",
  fit.varweight = 0.1,
  fit.min = 5,
  threads = 1,
  remove = TRUE
)

Arguments

dir

directory containing AA fasta files [mandatory]

file_ending

define file ending to consider [default: *]

searchtool

specify sequence search algorithm last, mmseqs2 or diamond [default: last]

lastpath

specify the PATH to the last binaries [default: /extdata/last-1542/bin/]

lastD

last option D: query letters per random alignment [default: 1e6]

mmseqs2path

specify the PATH to the mmseqs2 binaries [default: NULL]

mmseqs2sensitivity

specify the sensitivity option of mmseqs2 [default: 5.7]

diamondpath

specify the PATH to the diamond binaries [default: NULL]

diamondsensitivity

specify the sensitivity option of diamond [default: –sensitive]

outpath

specify the output PATH [default: /tmp]

crbh

specify if conditional-reciprocal hit pairs should be retained as secondary hits [default: TRUE]

keepSingleDirection

specify if single direction secondary hit pairs should be retained [default: FALSE]

eval

evalue [default: 1e-3]

qcov

query coverage [default: 0.0]

tcov

target coverage [default: 0.0]

pident

percent identity [default: 0.0]

alnlen

alignment length [default: 0.0]

rost1999

specify if hit pairs should be filter by equation 2 of Rost 1999 [default: FALSE]

filter

specify additional custom filters as list to be applied on hit pairs [default: NULL]

fit.type

specify if mean or median should be used for fitting [default: mean]

fit.varweight

factor for fitting function to consider neighborhood [default: 0.1]

fit.min

specify minimum neighborhood alignment length [default: 5]

threads

number of parallel threads [default: 1]

remove

specify if last result files should be removed [default: TRUE]

Value

List of three (crbh=FALSE)

Author(s)

Kristian K Ullrich

References

Aubry S, Kelly S et al. (2014) Deep Evolutionary Comparison of Gene Expression Identifies Parallel Recruitment of Trans-Factors in Two Independent Origins of C4 Photosynthesis. PLOS Genetics, 10(6) e1004365.

Kiełbasa, SM et al. (2011) Adaptive seeds tame genomic sequence comparison. Genome research, 21(3), 487-493.

Rost B. (1999). Twilight zone of protein sequence alignments. Protein Engineering, 12(2), 85-94.

See Also

aafile2rbh

Examples

## compile last-1542 within CRBHits
CRBHits::make_last()

kullrich/CRBHits documentation built on March 29, 2024, 11:34 a.m.