twoSeqSim: Protein Sequence Alignment for Two Protein Sequences

Description Usage Arguments Value Author(s) See Also Examples

View source: R/par-01-parSeqSim.R

Description

This function implements the sequence alignment between two protein sequences.

Usage

1
2
twoSeqSim(seq1, seq2, type = "local", submat = "BLOSUM62",
  gap.opening = 10, gap.extension = 4)

Arguments

seq1

A character string, containing one protein sequence.

seq2

A character string, containing another protein sequence.

type

Type of alignment, default is 'local', could be 'global' or 'local', where 'global' represents Needleman-Wunsch global alignment; 'local' represents Smith-Waterman local alignment.

submat

Substitution matrix, default is 'BLOSUM62', can be one of 'BLOSUM45', 'BLOSUM50', 'BLOSUM62', 'BLOSUM80', 'BLOSUM100', 'PAM30', 'PAM40', 'PAM70', 'PAM120', or 'PAM250'.

gap.opening

The cost required to open a gap of any length in the alignment. Defaults to 10.

gap.extension

The cost to extend the length of an existing gap by 1. Defaults to 4.

Value

An Biostrings object containing the alignment scores and other alignment information.

Author(s)

Nan Xiao <https://nanx.me>

See Also

See parSeqSim for paralleled pairwise protein similarity calculation based on sequence alignment. See twoGOSim for calculating the GO semantic similarity between two groups of GO terms or two Entrez gene IDs.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
# Be careful when testing this since it involves sequence alignment
# and might produce unpredictable results in some environments
library("Biostrings")
s1 = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
s2 = readFASTA(system.file("protseq/P10323.fasta", package = "protr"))[[1]]
seqalign = twoSeqSim(s1, s2)
summary(seqalign)
print(seqalign@score)
## End(Not run)

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Local Single Subject Pairwise Alignments
Number of Alignments:  1

Scores:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    277     277     277     277     277     277 

Number of matches:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     93      93      93      93      93      93 

Top 10 Mismatch Counts:
   SubjectPosition Subject Pattern Count Probability
1               33       F       Q     1           1
2               34       R       Y     1           1
3               35       Q       S     1           1
4               36       N       Q     1           1
5               41       V       F     1           1
6               44       V       K     1           1
7               47       K       L     1           1
8               48       A       F     1           1
9               50       Q       D     1           1
10              51       H       I     1           1
[1] 277

protr documentation built on Nov. 22, 2018, 9:04 a.m.