Matrices of dissimilarity scores between amino acid sequences

Share:

Description

Computes a matrix providing the distances based on dissimilarity scores between sequences from two multiple sequence alignments.

Usage

1
mat.dis(align1, align2, sub.mat.id = "PAM250", sqrt=FALSE)

Arguments

align1

a list of character vectors representing a first multiple sequence aligment.

align2

a list of character vectors representing a second multiple sequence aligment.

sub.mat.id

a string of characters indicating the amino acid substitution matrix used for calculation of the dissimilarity score. This should be one of "PAM40", "PAM80", "PAM120", "PAM160", "PAM250", "BLOSUM30", "BLOSUM45", "BLOSUM62", "BLOSUM80", "GONNET", "JTT", "JTT_TM" and "PHAT". The supported substitution matrices are in sub.mat. Default is PAM250.

sqrt

a logical value indicating whether the distance should be equal to the squared root of the difference score (TRUE) or not (FALSE). Default is FALSE.

Details

The dissimilarity score between a sequence i from align1 and a sequence j from align2 is calculated with an amino acid substitution matrix from sub.mat.

If align1 and align2 are identical, mat.dis computes the symetrical matrix of distances between each sequence of the alignment.

Before using mat.dis, users must check the alignment of sequences within align1 and align2 and between align1 and align2.

Value

A named numeric matrix providing the dissimilarity-based distances between each pair of sequences from align1 and align2, based on the substitution matrix sub.mat.id. The number of rows and columns is identical to the number of sequences in align1 and align2, respectively.

Author(s)

Julien Pele and Jean-Michel Becu

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# calculating dissimilarity distances between GPCR sequences sample from 
#H. sapiens and D. melanogaster, based on the PAM250 matrix:
aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds"))
mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5])
mat.dis1

# calculating dissimilarity distances between GPCRs sequences sample from 
#H. sapiens and D. melanogaster, based on the BLOSUM45 matrix:
aln_human <- import.fasta(system.file("msa/human_gpcr.fa", package = "bios2mds"))
aln_drome <- import.fasta(system.file("msa/drome_gpcr.fa", package = "bios2mds"))
mat.dis1 <- mat.dis(aln_human[1:5], aln_drome[1:5], sub.mat.id = "BLOSUM45")
mat.dis1