The orthologr
package provides multiple functions to perform pairwise and multiple sequence alignments. The following functions are implemented in orthologr
:
multi_aln()
: Perform Multiple Sequence Alignmentspairwise_aln()
: Perform Pairwise Sequence Alignmentscodon_aln()
: Perform Codon AlignmentsPrior to be able to use all sequence alignment functions implemented in orthologr
you need to install corresponding alignment tools of interest.
The above mentioned functions provide interfaces to the following alignment programs:
multi_aln()
functionClustalW : Advanced multiple alignment tool of nucleic acid and protein sequences
T_Coffee : A collection of tools for processing multiple sequence alignments of nucleic acids and proteins as well as their 3D structure
MUSCLE : Fast and accurate multiple alignment tool of nucleic acid and protein sequences
ClustalO : Fast and scalable multiple alignment tool for nucleic acid and protein sequences that is also capable of performing HMM alignments
MAFFT : A tool for multiple sequence alignment and phylogeny
The easiest way to use the multi_aln()
function is to store the corresponding multiple sequence alignment tools in the default execution PATH
of you system (e.g. /usr/local/bin
on UNIX machines).
You can test whether the corresponding multiple sequence alignment tool can be executed from the default PATH
by running:
MacOS: system("clustalw2 -help")
Linux: system("clustalw -help")
Windows: system("clustalw2.exe -help")
In case everything is installed appropriately, you should see:
CLUSTAL 2.1 Multiple Sequence Alignments DATA (sequences) -INFILE=file.ext :input sequences. -PROFILE1=file.ext and -PROFILE2=file.ext :profiles (old alignment). VERBS (do things) -OPTIONS :list the command line parameters -HELP or -CHECK :outline the command line params. -FULLHELP :output full help content. -ALIGN :do full multiple alignment.
The multi_aln()
function takes a fasta file storing the genes (proteins) that shall
be aligned. The tool
argument specifies the alignment tool that shall be used to perform
a multiple sequence alignment (in this case tool = clustalw
). The get_aln
argument specifies whether or not the alignment shall be printed out to the console. In case get_aln = FALSE
, the corresponding alignment
file is stored in the file.path(tempdir(),_alignment,multi_aln)
directory.
# in case the default execution path of clustalw runs properly on your system orthologr::multi_aln(file = system.file('seqs/aa_seqs.fasta', package = 'orthologr'), tool = "clustalw", get_aln = TRUE)
CLUSTAL 2.1 Multiple Sequence Alignments [1] "Multiple Alignment successfully written in /var/folders/tb/ v6lh8_g505bfjytt3lvdmvw00000gn/T//RtmpOXdtWL/_alignment/multi_aln/clustalw.aln." AAMultipleAlignment with 2 rows and 429 columns aln names [1] MEDQVGFGFRPNDEELVGHYLRNKIEGNTSRDVEVAISEVNICSYD...IKIPPSTNTVKQSWIVLENAQWNYLKNMIIGVLLFISVISWIILVG AT1G01010.1 [2] --------MAASEHRCVGCGFR---------------VKSLFIQYS...IFEP-------TIFLTQFGSLMQYLSYLFRTV-------------- 333554|PACid_1603...
It is also possible to pass additional parameters to the ClustalW call:
# running clustalw using additional parameters # details: system("clustalw2 -help") orthologr::multi_aln(file = system.file('seqs/aa_seqs.fasta', package = 'orthologr'), tool = "clustalw", get_aln = TRUE, params = "-PWMATRIX=BLOSUM -TYPE=PROTEIN")
CLUSTAL 2.1 Multiple Sequence Alignments Sequence type explicitly set to Protein Sequence format is Pearson Sequence 1: AT1G01010.1 429 aa Sequence 2: 333554|PACid_16033839 245 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 2 Guide tree file created: [/Library/Frameworks/R.framework/Versions/3.1/Resources/library/orthologr/seqs/aa_seqs.dnd] There are 1 groups Start of Multiple Alignment Aligning... Group 1: Delayed Alignment Score 29 CLUSTAL-Alignment file created [/var/folders/tb/v6lh8_g505bfjytt3lvdmvw00000gn/T//RtmpOXdtWL/_alignment/multi_aln/clustalw.aln] [1] "Multiple Alignment successfully written in /var/folders/tb/v6lh8_g505bfjytt3lvdmvw00000gn/T//RtmpOXdtWL/_alignment/multi_aln/clustalw.aln." AAMultipleAlignment with 2 rows and 429 columns aln names [1] MEDQVGFGFRPNDEELVGHYLRNKIEGNTSRDVEVAISEVNICSYD...IKIPPSTNTVKQSWIVLENAQWNYLKNMIIGVLLFISVISWIILVG AT1G01010.1 [2] --------MAASEHRCVGCGFR---------------VKSLFIQYS...IFEP-------TIFLTQFGSLMQYLSYLFRTV-------------- 333554|PACid_1603...
system("t_coffee -version")
In case everything is installed appropriately, you should see:
PROGRAM: T-COFFEE Version_11.00.8cbe486 (2014-08-12 21:55:14 - Revision 8cbe486 - Build 470)
orthologr::multi_aln(file = system.file('seqs/aa_seqs.fasta', package = 'orthologr'), tool = "t_coffee", get_aln = TRUE)
system("muscle -help")
In case everything is installed appropriately, you should see:
MUSCLE v3.8.31 by Robert C. Edgar http://www.drive5.com/muscle This software is donated to the public domain. Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97. Basic usage muscle -in <inputfile> -out <outputfile> Common options (for a complete list please see the User Guide): -in <inputfile> Input file in FASTA format (default stdin) -out <outputfile> Output alignment in FASTA format (default stdout) -diags Find diagonals (faster for similar sequences) -maxiters <n> Maximum number of iterations (integer, default 16) -maxhours <h> Maximum time to iterate in hours (default no limit) -html Write output in HTML format (default FASTA) -msf Write output in GCG MSF format (default FASTA) -clw Write output in CLUSTALW format (default FASTA) -clwstrict As -clw, with 'CLUSTAL W (1.81)' header -log[a] <logfile> Log to file (append if -loga, overwrite if -log) -quiet Do not write progress messages to stderr -version Display version information and exit Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2 Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3 Fastest possible (nucleotides): -maxiters 1 -diags
# in case the default execution path of muscle runs properly on your system multi_aln(file = system.file('seqs/aa_seqs.fasta', package = 'orthologr'), tool = "muscle", get_aln = TRUE)
system("clustalo --help")
# in case the default execution path of muscle runs properly on your system multi_aln(file = system.file('seqs/aa_seqs.fasta', package = 'orthologr'), tool = "clustalo", get_aln = TRUE)
system("mafft -help")
In case everything is installed appropriately, you should see:
------------------------------------------------------------------------------ MAFFT v7.187 (2014/10/02) http://mafft.cbrc.jp/alignment/software/ MBE 30:772-780 (2013), NAR 30:3059-3066 (2002) ------------------------------------------------------------------------------ High speed: % mafft in > out % mafft --retree 1 in > out (fast) High accuracy (for <~200 sequences x <~2,000 aa/nt): % mafft --maxiterate 1000 --localpair in > out (% linsi in > out is also ok) % mafft --maxiterate 1000 --genafpair in > out (% einsi in > out) % mafft --maxiterate 1000 --globalpair in > out (% ginsi in > out) If unsure which option to use: % mafft --auto in > out --op # : Gap opening penalty, default: 1.53 --ep # : Offset (works like gap extension penalty), default: 0.0 --maxiterate # : Maximum number of iterative refinement, default: 0 --clustalout : Output: clustal format, default: fasta --reorder : Outorder: aligned, default: input order --quiet : Do not report progress --thread # : Number of threads (if unsure, --thread -1)
multi_aln()
functionThe multi_aln()
function is an interface function between R and common multiple sequence alignment tools. When working with this function a new folder named _alignment
is being created and stores the multiple alignment returned by the corresponding alignment tool. The argument get_aln = TRUE
allows to work with the multiple alignment generated by the corresponding alignment tool within the current R session.
This small pairwise alignment example shall illustrate how the multi_aln()
output can be used:
multi_aln(system.file('seqs/aa_seqs.fasta', package = 'orthologr'), tool = "clustalw", get_aln = TRUE)
$nb [1] 2 $nam [1] "AT1G01010.1" "333554|PACid_16033839" $seq $seq[[1]] [1] "medqvgfgfrpndeelvghylrnkiegntsrdvevaisevnicsydpwnlrfqskyksrdamwyffsrrennk gnrqsrttvsgkwkltgesvevkdqwgfcsegfrgkighkrvlvfldgrypdktksdwvihefhydllpehqrtyvic rleykgddadilsayaidptpafvpnmtssagsvvnqsrqrnsgsyntyseydsanhgqqfnensnimqqqplqgsfn plleydfanhggqwlsdyidlqqqvpylapyenesemiwkhvieenfeflvdertsmqqhysdhrpkkpvsgvlpdds sdtetgsmifedtssstdsvgssdepghtriddipslniieplhnykaqeqpkqqskekvissqksecewkmaedsik ippstntvkqswivlenaqwnylknmiigvllfisviswiilvg" $seq[[2]] [1] "--------maasehrcvgcgfr---------------vkslfiqyspgnirlmk------------------- ---------------cgnckevadey----------iecermiifid---------------------lilhrpkvyr hvlynainpetvniqhllwklvfvyllldsyrslllrrtdeess----------------fshssvlisikvligvls anaafifs----------------------------------------faiaakgllnevs---rgreimlgicissy fkifllamlvwefp----------------msvifivdilvltsnsmalkvmtestmtrciavcliahlvrfsvgqif ep-------tifltqfgslmqylsylfrtv--------------" $com [1] NA attr(,"class") [1] "alignment"
Furthermore, multiple alignments are returned as follows:
multi_aln(system.file('seqs/multi_aln_example.fasta', package = 'orthologr'), tool = "clustalw", get_aln = TRUE)
CLUSTAL 2.1 Multiple Sequence Alignments $nb [1] 4 $nam [1] "AT1G01010.1|PACid_19656964" "Thhalv10006531m|PACid_20187082" [3] "Bra032623|PACid_22715924" "311315|PACid_16059488" $seq $seq[[1]] [1] "----------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- -----------------------------------------medqvgfg------------------frp ndeelvghylrnkiegntsr----------dvevaisevnicsydp---------wnlrfqskyksrd-- -------amwyffsrre------------nnkgnrqsrttvsgkwkltgesvevkdqwgfcseg------ --frgkig-------------------------------------------hkrvlvfldgrypdktksd wvihefhydllpehqr-----------------tyvicr--leykgddadilsayaidptpafvpnmtss agsvvnqsrqrnsgsyntyseydsanhgqqfnensn-imqqqplqgsfnplleydfanhg-------gqw l----------------------------sdyidlqqqvpylapye------------------------ -----------------------------------nesemiwkhvieenfeflvdertsmqqhysdhrpk kpvsgvlp----------------ddssdtetgsmifedtssstds------------------------ ---------------------------------vgssdepghtriddipslniieplhnykaqeqpkqqs kekvissqksecewkmaeds---------ikippstntvkqswivlenaqwnylknmiigvllfisvisw iilvg-------------------------------" $seq[[2]] [1] "mvmederrgdikppsywldacediscdliddlvsdfdpssvavaesvdengvnndffggidhild siknggglpnrahingvsetnsqringnsevseaaqliagettvsvkgnvlqkcggkrdevskeegeknr krarvcsyqrersnlsgrgqansregdrfmnrkrtrnwdeaghnkrrdgynyrrdgrdreargywerdkv gsnelvyrsgtweadherdlkkesgrnresdekaeenkskpeehkekvveeqarryqldvleqakaknti afletgagktliailliksihkdltsqnrkmlsvflvpkvplvyqqaevirnqtcfqvghycgemgqdfw darrwqrefeskqvlvmtaqillnilrhsiirmeainllildechhavkkhpyslvmsefyhttpkdkrp aifgmtaspvnlkgvssqvdcaikirnletkldstvctikdrkelekhvpmpseivveydkaatmwslhe kikqmiaaveeaaqassrkskwqfmgardagakdelrqvygvsertesdgaanlihklrainytlaelgq wcaykvaqsfltalqsdervnfqvdvkfqesylsevvsllqcellegaaaekavaelskpengnandeie egelpddhvvsggehvdkvigaavadgkvtpkvqsliklllkyqhtadfraivfvervvaalvlpkvfae lpslgfircasmighnnsqemkssqmqdtiskfrdgqvtllvatsvaeegldirqcnvvmrfdlaktvla yiqsrgrarkpgsdyilmverenvshaaflrnarnseetlrkeaiertdlshlkdssrlisidavpgtvy kveatgamvslnsavglihfycsqlpgdryailrpefsmvkhekpgghteyscrlqlpcnapfeilegpv cssmrlaqqavclagckklhemgaftdmllpdkgsgqdaekadqddegepipgtarhrefypegvadvlk gewilsgkeicessklfhlymysvrcvdsgvskdpfltevsefavlfgneldaevlsmsmdlyvaramit kaslafrgslditesqlssikkfhvrlmsivldvdvepsttpwdpakaylfvpvadnssaepikginwel vekitkttvwdnplqrarpdvylgtnertlggdrreygfgklrhnigfgqkshptygirgavasfdvvra sgllpvrdalekevegdlsqgklmmadgcmvaenllgkivtaahsgkrfyvdsicydmsaetsfprkegy lgpleyntyadyykqkygvdlsckqqplikgrgvsycknllsprfeqsgesetildktyyvflppelcvv hplsgslvrgaqrlpsimrrvesmllavqlknlisypiptskilealtaascqetfcyeraellgdaylk wivsrflflkypqkhegqltrmrqqmvsnlvlyqyalvkglqsyiqadrfapsrwsapgvppvydedtkd ggssffdeeekpegnkdvfedgemedgelegdlssyrvlssktladvvealigvyyveggktaanhlmkw igihveddpeetegsvkpvynvpesvlksidfvgleralkyeftekgllveaithasrpssgvscyqrle fvgdavldhlitrhlfftytslppgrltdlraaavnnenfarvavkhklhlylrhgssalekqirdfvke vltesskpgfnsfglgdckapkvlgdivesiagaifldsgkdttaawkvfqpllqpmvtpetlpmhpvre lqercqqqaegleykasrsgntatvevfidgvqvgaaqnpqkkmaqklaarnalaalkekeaeeskkkqa ngnaagenqddnengnkkngnqtftrqtlndiclrknwpmpsyrcvkeggpahakrftfgvrvntsdrgw tdecigepmpsvkkakdsaailllellnktys----" $seq[[3]] [1] "----------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- -----------------------------------------------mqqppqmfpmapsmpptnitteq iqkyleenkklimaimenqnlgklaecaqyqallqknlmylaaiadaqpppstagatppp---------- ---------------------------------amasqmgaphpg------------------------- ---------------------------------------------mqppsyfmqhp------qasgmaqq appagifp----------------------prgplqfgsphqlqdp------------------------ -----------------------------------qqqhmhqqamqghmgmrpmginnnngmqhqmqqqp etslggsaanvgirggkqdg-------------------------adgqgkddgk--------------- ------------------------------------" $seq[[4]] [1] "----------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- ---------------------------------------------------------------------- --------------------------------------------------------mlslnmrteienlw vfalaskfnifmqehfaslllaiaitwctltivfwstpggpawg-------------------------- --------------------------kyfftrrfsslghnrksknlipgprgfplvgsmslrsshvahqr iasvaemsnakrlmafslgdtkvvvtchpdvakeilnssvfadrpvdetayg---lmfnramgfapngty wrtlrrlssnhlfnpkqikrseeqrrviatqmvnafarnaksafavrdllktaslcnmmglvfg------ -----------------------------------------------------reyelesnnnveseclk glveegydllg-----------------------tlnwtdhlpwlagldfqqirfrcsqlvpkvnlllsr iihehyatgnfldvllslqrseklsdsdivavlwemifrgtdtvavliewvlarialhpk---------- ----------------------------------vqstvhdeldr------------------------- ---------------------------------------------vvgrsrtvdesdlpsltyltamike vlr--lhp-----------------------pgpllswarlsitdt------------------------ ------------------------------------tvdgyhvpagttamvnmwaiardphvwedplefk perfvakdgeaefsvfgsdlr---------------lapfgsgkrvcpgknlglttvsfwvatllhefew lpsveanppdlsevlrlscemacplivnvsprrksv" $com [1] NA attr(,"class") [1] "alignment"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.