Most orthologr
functions are interface functions that pass data to common
bioinformatics tools, internally call the corresponding tool, and read their output
as R object. For this purpose, when using interface functions in orthologr
users need to
install the underlying bioinformatics tools to obain accurate results.
The following sections provide step by step instructions or guidance on installing
all bioinformatics tools for which R interface functions are implemented in orthologr
.
Some tools are not trivial to install, so please read the corresponding sections carefully and execute test cases that are presented in each section.
The following bioinformatics tools you are going to install are based on the these programming languages:
Please make sure these programming languages are installed and executable on the machines you are going to run orthologr
on.
The orthologr
package provides interfaces to the pairwise alignment tools, BLAST
and DIAMOND v2
. We recommend the use of DIAMOND v2
as it saves time whilst being as sensitive as BLAST
.
BLAST
BLAST (= Basic Local Alignment Search Tool) finds regions of similarity between biological sequences and is also used as underlying paradigm of most orthology inference methods.
1) Go to ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ and download the system specific BLAST program.
2) Install BLAST :
Environment Variables
section of the installation manual: Windows
and make sure the execution PATH
variable is set correctly.Configuration
section of the installation manual: Unix
and make sure the execution PATH
variable is set correctly to usr/local/bin
.For example for Linux systems open the Terminal
application and run (Thanks to Alexander Gabel):
# download BLAST+ version 2.2.31 wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.31/ncbi-blast-2.2.31+-x64-linux.tar.gz # extract the compiled version of BLAST tar zxvpf ncbi-blast+2.2.31+-x64-linux.tar.gz # copy BLAST files to `usr/local/bin` cp ncbi-blast-2.2.31+/bin/* usr/local/bin
Alternatively users can set the system call to the BLAST programs by specifying the PATH
variable (this is useful, because it allows an easier update of BLAST versions instead of deleting all BLAST programs from usr/local/bin
):
# open vim text editor vi .bash_profile # type 'Shift' then I to edit the file .bash_profile # and specify the export PATH export PATH=${PATH}:/path/to/downloaded/blast/folder/ncbi-blast-2.2.31+/bin # type 'ESC' then ':' then 'w' then 'q' to save and quit the .bash_profile file # log out from your server with exit # log in again and type blastp -version
Now users should see the BLAST command line options.
Based on our personal experience the installation of BLAST works best when copy/pasting
the BLAST executables to the path usr/local/bin
. In detail you can run the following steps
to copy/paste the BLAST executables to usr/local/bin
(on Unix systems). However, updating BLAST will then need to manually delete all previous BLAST programs from usr/local/bin
:
Open the Terminal application on your system and type:
open /usr/local/bin
Next, copy/paste the blastp
, makeblastdb
, etc files (BLAST executables) from your BLAST folder to /usr/local/bin
. To do so you will need to enter the system password to allow the copy process.
After installing the BLAST program you can open an R session and type the following command to check whether or not BLAST can be executed from R.
# test whether blastp is correctly installed on your machine system("blastp -version")
blastp: 2.2.31+ Package: blast 2.2.31, build Oct 27 2014 17:10:51
You should see this output if BLAST was installed correctly.
In case you find the following output:
sh: blastp: command not found
You should return to step 2)
and install BLAST so that it can be executed
from the default execution PATH
.
These interface functions to BLAST+ are implemented in orthologr
:
blast()
: Interface function to BLAST+blast_best()
: Perform a BLAST+ best hit searchblast_rec()
: Perform a BLAST+ reciprocal best hit (RBH) searchset_blast()
: Preparing the parameters and databases for subsequent BLAST+ searchesblast.nr()
: Perform a BLASTp search against NCBI nrdelta.blast()
: Perform a DELTA-BLAST Searchadvanced_blast()
: Advanced interface function to BLAST+advanced_makedb()
: Advanced interface function to makeblastdbDIAMOND2
DIAMOND2 (= Double Index alignment of Next-generation sequencing data) finds, like BLAST
, regions of similarity between biological sequences. Unlike BLAST
it is much much faster (up to 10 000X faster in the default fast
mode and over 80X faster in the ultra-sensitive
mode, which is as sensitive as BLAST
). Thus, DIAMOND2
facilitates even faster orthology inference.
1) Go to the download site in the DIAMOND2
wiki and follow the instructions for installation. DIAMOND2
is supported on Linux, macOS and Windows.
2) Check the installation of DIAMOND2
by running the command
diamond --version
3) After installing the DIAMOND2
program you can open an R session and type the following command to check whether or not DIAMOND2
can be executed from R.
# test whether diamond is correctly installed on your machine system("diamond --version")
diamond version 2.1.8
You should see this output if DIAMOND2
was installed correctly.
In case you find the following output:
sh: diamond: command not found
You should return to step 1)
and install DIAMOND2
so that it can be executed
from the default execution PATH
.
These interface functions to DIAMOND2
are implemented in orthologr
, akin to the interface functions to BLAST+
:
diamond()
: Interface function to DIAMOND2diamond_best()
: Perform a diamond best hit searchdiamond_rec()
: Perform a diamond reciprocal best hit (RBH) searchset_diamond()
: Preparing the parameters and databases for subsequent diamond searchesFurthermore, the following functions use DIAMOND2
by default, though the use of BLAST can be specified through the parameter aligner = "blast"
:
dNdS()
: Compute dNdS values for two organismsdivergence_stratigraphy()
: Perform 'Divergence Stratigraphy'The orthologr
package also provides interfaces to the following Multiple Alignment Tools.
Nevertheless, non of them have to be installed if the corresponding interface functions
are not used.
ClustalW2
To install ClustalW2
please go to the ClustalW homepage and download
the corresponding clustalw2 program matching your operating system.
After downloading and unpacking the clustalw2
program, please go to the clustalw-2.1 folder and open a Terminal
application
to type (in this example for Mac OS X):
# copy clustalw2 files to `usr/local/bin` cp clustalw2 usr/local/bin
T-Coffee
To install T-Coffee
please go to the T-Coffee homepage and download
the corresponding T-Coffee program matching your operating system.
MUSCLE
ClustalO
1) Download the argtable program.
2) Unzip the file.
3) Run within the argtable folder:
``` ./configure
make
make check
sudo make install ```
4) Download ClustalO.
5) Unzip the folder and run within the folder:
``` ./configure
make
sudo make install ```
MAFFT
In orthologr
the function multi_aln()
provides interfaces to all of these multiple alignment tools
as well as an pairwise alignment interface to the Biostrings package performing a Needleman-Wunsch algorithm.
The codon alignment tool Pal2Nal is already integrated in the orthologr
package
and doesn't need to be installed.
You don't need to worry about downloading and installing PAL2NAL, it is already included in the orthologr
package.
The corresponding function codon_aln()
takes a protein alignment and the corresponding coding sequences and returns
a codon alignment by calling Pal2Nal from inside of the orthologr
package.
dNdS estimation is a method to quantify the selection pressure acting on a specific protein sequence determined by pairwise comparisons of
amino acid substitutions between two protein sequences and their corresponding codon alignments.
Different models have been proposed to estimate this ratio quantifying selection pressure on proteins.
The orthologr
package includes the most common dNdS estimation methods.
Starting with an codon alignment returned by codon_aln()
the function dNdS()
computes
the the dN, dS, and dNdS values of pairs of proteins.
Based on implementations provided by gestimator
, ape
, and KaKs_Calculator,
the following dNdS Estimation Methods are available in orthologr
:
Li : Li's method (1993) -> provided by the ape package
Comeron : Comeron's method (1995)
NG : Nei, M. and Gojobori, T. (1986)
LWL : Li, W.H., et al. (1985)
MLWL (Modified LWL), MLPB (Modified LPB): Tzeng, Y.H., et al. (2004)
YN : Yang, Z. and Nielsen, R. (2000)
MYN (Modified YN): Zhang, Z., et al. (2006)
For this purpose you need to have KaKs_Calculator installed on your system and executable from your default PATH
, e,g, /usr/local/bin/
.
Please go to the KaKs_Calculator homepage and download KaKs_Calculator.
E.g.
# download KaKs_Calculator wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/kaks-calculator/KaKs_Calculator1.2.tar.gz # unzip gzip -d KaKs_Calculator1.2.tar.gz tar -xf KaKs_Calculator1.2.tar # install cd KaKs_Calculator1.2/src sudo make sudo cp KaKs_Calculator /usr/local/bin/
Now you should be able to run KaKs_Calculator via KaKs_Calculator -h
in your bash or as system("KaKs_Calculator -h")
in R.
The most recent version KaKs_Calculator2.0
can be found here.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.