This package implemnet algorithms used for pairwise sequence alignment, including Needleman-Wunsch algorithm to produce global sequence alignment and Smith-Waterman algorithm to prodece local alignment. It also review some existing algorithms used for multiple sequence alignment, such as ClustalW, ClustalOmega, and Muscle, comparing their performances based on runnting time and accuracy. This package also include function to generate DNA/RNA and protein sequences.
The "W" in the name stands for "weighting", which indicates assigning different weights to sequences and parameters at different positions in alignment.
Basic multiple alignment consists of three main stages:
Highlights:
Two distinguish features:
Highlights:
This package includes a wrap as the common user-interface to both the Needle-Wunsch and Smith-Waterman algorithm. User can determine which algorithm or both of the two algorithm would be used. An user-derfined parameter input window would pop-up, where the input data could be FASTA file or GeneBank identities or two sequences. User can determine the substitution matrix and gap penalties.
Example here will only show how to use the two functions for Needleman-Wunsch and Smith-Waterman algorithm
setwd("C:/Users/ygu/Documents/GitHub/seqAlign/R") seq1 <- "HEAGAWGHEE" seq2 <- "HAWHEAE" source("NWalgorithm.R") source("SWalgorithm.R") library("Biostrings") data("PAM120") data("BLOSUM50") subMatrix <- PAM120 NW.align <- NWalgorithm(seq1,seq2,subMatrix,gapOpening = 8, gapExtension = 1) subMatrix <- BLOSUM50 SW.align <- SWalgorithm(seq1,seq2,subMatrix,gapOpening = 10, gapExtension = 0.2)
The two input sequences are
seq1 seq2
The aligned paths are
NW.align$path SW.align$path
The F matrices that used to construct the optimal path are
NW.align$fMatrix SW.align$fMatrix
The function for multiple sequence alignment applies 'msa' package. The input paramenters include the sequences and which methods would be used. Also, user can define the working directory and output file name.
setwd("C:/Users/ygu/Documents/GitHub/seqAlign/R") source("msaAlign.R") library('msa') seq <- readAAStringSet('C:/Users/ygu/Documents/GitHub/seqAlign/data/simSeq.txt') clustalw.align <- msaAlign(seq,method="ClustalW") clustalo.align <- msaAlign(seq,method="ClustalOmega") muscle.align <- msaAlign(seq,method="Muscle")
The generated sequences data is
seq
The alignment results are
print(clustalw.align,show="complete") print(clustalo.align,show="complete") print(muscle.align,show="complete")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.