EDSSMat: EDSSMat Disorder-based Substitution Matrices.

EDSSMatR Documentation

EDSSMat Disorder-based Substitution Matrices.

Description

The EDSSMat series of matrices were developed and described in Trivedi and Nagarajaram (2019).
In short: These are substitution scoring matrices used to align proteins or regions which experience intrinsic disorder. Alignment blocks, used to compute the matrix values, were composed of predicted intrinsically disordered regions. When compared to other, more frequently used substitution matrices (like BLOSUM and PAM), EDSSMat had significantly smaller E-values when aligning regions of disorder. Additionally, EDSSMat62 was shown to identify both close and distant homologs of a specific IDP while other matrices could only identify some close homologs. See the source article for additional information and for comparisons to other matrices.

Additionally, please cite the source article when using any EDSSMat matrix.

Usage

EDSSMat50

EDSSMat60

EDSSMat62

EDSSMat70

EDSSMat75

EDSSMat80

EDSSMat90

Format

All matrices are symmetric. 24 residues are represented:

  • Each of the standard 20 standard amino acids

  • Four ambiguous residues:

    • B: Asparagine or Aspartic Acid (Asx)

    • Z: Glutamine or Glutamic Acid (Glx)

    • X: Unspecified or unknown amino acid

    • *: Stop

An object of class matrix (inherits from array) with 24 rows and 24 columns.

An object of class matrix (inherits from array) with 24 rows and 24 columns.

An object of class matrix (inherits from array) with 24 rows and 24 columns.

An object of class matrix (inherits from array) with 24 rows and 24 columns.

An object of class matrix (inherits from array) with 24 rows and 24 columns.

An object of class matrix (inherits from array) with 24 rows and 24 columns.

An object of class matrix (inherits from array) with 24 rows and 24 columns.

Matrices

There are 7 reported EDSSMat matrices. Each vary depending on the percent identity threshold used to cluster protein sequences. EDSSMat50 clustered proteins with 50% identity or higher, EDSSMat62 clustered proteins with 62% identity or higher, etc.
See Usage Section for available matrices

Optimal Gap Parameters

These values were described in the source article and reported in Supplemental Table S5. Therefore, it is recommended to use these parameters for any alignment utilizing the respective EDSS matrix. These were determined for 3 categories: Proteins containing Less Disorder (LD), defined as [0-20%] disorder, Moderate Disorder (MD), defined as (20-40%] disorder, and High Disorder (HD), defined as (40-100%] disorder.
Please see the source article for additional information.

Matrix Name Gap Open (LD) Gap Extension (LD) Gap Open (MD) Gap Extension (MD) Gap Open (HD) Gap Extension (HD)
EDSSMat60 -7 -1 -6 -2 -14 -3
EDSSMat62 -8 -1 -5 -2 -19 -2
EDSSMat70 -7 -1 -5 -2 -19 -2
EDSSMat75 -8 -1 -5 -2 -19 -2
EDSSMat80 -7 -1 -5 -2 -15 -3
EDSSMat90 -7 -1 -5 -2 -19 -2

Source

Trivedi, R., Nagarajaram, H.A. Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 9, 16380 (2019). https://doi.org/10.1038/s41598-019-52532-8

See Also

Disordered Matrices Vignette within the idpr package

Other IDP-based Substitution Matrices: DUNMat, DisorderMat


wmm27/idpr documentation built on Jan. 12, 2023, 8:45 a.m.