parSeqSim: Parallellized Protein Sequence Similarity Calculation based...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/par-01-parSeqSim.R

Description

This function implemented the parallellized version for calculating protein sequence similarity based on sequence alignment.

Usage

1
parSeqSim(protlist, cores = 2, type = "local", submat = "BLOSUM62")

Arguments

protlist

A length n list containing n protein sequences, each component of the list is a character string, storing one protein sequence. Unknown sequences should be represented as "".

cores

Integer. The number of CPU cores to use for parallel execution, default is 2. Users can use the detectCores() function in the parallel package to see how many cores they could use.

type

Type of alignment, default is 'local', could be 'global' or 'local', where 'global' represents Needleman-Wunsch global alignment; 'local' represents Smith-Waterman local alignment.

submat

Substitution matrix, default is 'BLOSUM62', can be one of 'BLOSUM45', 'BLOSUM50', 'BLOSUM62', 'BLOSUM80', 'BLOSUM100', 'PAM30', 'PAM40', 'PAM70', 'PAM120', or 'PAM250'.

Value

A n x n similarity matrix.

Author(s)

Nan Xiao <https://nanx.me>

See Also

See twoSeqSim for protein sequence alignment for two protein sequences. See parGOSim for protein similarity calculation based on Gene Ontology (GO) semantic similarity.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
# Be careful when testing this since it involves parallelisation
# and might produce unpredictable results in some environments

library("Biostrings")
library("foreach")
library("doParallel")

s1 = readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
s2 = readFASTA(system.file("protseq/P08218.fasta", package = "protr"))[[1]]
s3 = readFASTA(system.file("protseq/P10323.fasta", package = "protr"))[[1]]
s4 = readFASTA(system.file("protseq/P20160.fasta", package = "protr"))[[1]]
s5 = readFASTA(system.file("protseq/Q9NZP8.fasta", package = "protr"))[[1]]
plist = list(s1, s2, s3, s4, s5)
psimmat = parSeqSim(plist, cores = 2, type = "local", submat = "BLOSUM62")
print(psimmat)
## End(Not run)

protr documentation built on Sept. 29, 2017, 9:02 a.m.