Introduction

This package implemnet algorithms used for pairwise sequence alignment, including Needleman-Wunsch algorithm to produce global sequence alignment and Smith-Waterman algorithm to prodece local alignment. It also review some existing algorithms used for multiple sequence alignment, such as ClustalW, ClustalOmega, and Muscle, comparing their performances based on runnting time and accuracy. This package also include function to generate DNA/RNA and protein sequences.

Mehtod

Needleman-Wunsch Algorithm

Smith-Waterman Algorithm

Clustal W program

The "W" in the name stands for "weighting", which indicates assigning different weights to sequences and parameters at different positions in alignment.
Basic multiple alignment consists of three main stages:

Highlights:

MUSCLE program

Two distinguish features:

Clustal Omega

Highlights:

Implementation

Implement Pairwise Sequence Alignment

This package includes a wrap as the common user-interface to both the Needle-Wunsch and Smith-Waterman algorithm. User can determine which algorithm or both of the two algorithm would be used. An user-derfined parameter input window would pop-up, where the input data could be FASTA file or GeneBank identities or two sequences. User can determine the substitution matrix and gap penalties.
Example here will only show how to use the two functions for Needleman-Wunsch and Smith-Waterman algorithm

setwd("C:/Users/ygu/Documents/GitHub/seqAlign/R")
seq1 <- "HEAGAWGHEE"
seq2 <- "HAWHEAE"
source("NWalgorithm.R")
source("SWalgorithm.R")
library("Biostrings")
data("PAM120")
data("BLOSUM50")
subMatrix <- PAM120
NW.align <- NWalgorithm(seq1,seq2,subMatrix,gapOpening = 8, gapExtension = 1)
subMatrix <- BLOSUM50
SW.align <- SWalgorithm(seq1,seq2,subMatrix,gapOpening = 10, gapExtension = 0.2)

The two input sequences are

seq1
seq2

The aligned paths are

NW.align$path
SW.align$path

The F matrices that used to construct the optimal path are

NW.align$fMatrix
SW.align$fMatrix

The function for multiple sequence alignment applies 'msa' package. The input paramenters include the sequences and which methods would be used. Also, user can define the working directory and output file name.

setwd("C:/Users/ygu/Documents/GitHub/seqAlign/R")
source("msaAlign.R")
library('msa')
seq <- readAAStringSet('C:/Users/ygu/Documents/GitHub/seqAlign/data/simSeq.txt')
clustalw.align <- msaAlign(seq,method="ClustalW")

clustalo.align <- msaAlign(seq,method="ClustalOmega")

muscle.align <- msaAlign(seq,method="Muscle")

The generated sequences data is

seq

The alignment results are

print(clustalw.align,show="complete")
print(clustalo.align,show="complete")
print(muscle.align,show="complete")


ygu427/seqAlign documentation built on May 4, 2019, 2:33 p.m.