Generating Various Numerical Representation Schemes for Protein Sequences


The protr package is a comprehensive toolkit for generating various numerical representation schemes of protein sequence. The descriptors are extensively utilized in bioinformatics and chemogenomics research. The commonly used descriptors include amino acid composition, autocorrelation, CTD, conjoint traid, quasi-sequence order, pseudo amino acid composition, and profile-based descriptors derived by Position-Specific Scoring Matrix (PSSM). The descriptors for proteochemometric (PCM) modeling include the scales-based descriptors derived by principal components analysis, factor analysis, multidimensional scaling, amino acid properties (AAindex), 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.), and BLOSUM/PAM matrix-derived descriptors. The protr package also integrates the function of parallelized similarity computation derived by pairwise protein sequence alignment and Gene Ontology (GO) semantic similarity measures.


Package: protr
Type: Package
License: BSD_3_clause


The package vignette can be opened with vignette('protr').

The web server for this package, ProtrWeb is located at:

Bug reports and feature requests should be sent to


Xiao, N., Cao, D.-S., Zhu, M.-F., and Xu, Q.-S. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31 (11), 1857–1859.

Questions? Problems? Suggestions? or email at

All documentation is copyright its authors; we didn't write any of that.