| entropy | R Documentation |
Calculate the sequence entropy score for every position in an alignment.
entropy(alignment)
alignment |
sequence alignment returned from
|
Shannon's information theoretic entropy (Shannon, 1948) is an often-used measure of residue diversity and hence residue conservation.
Returns a list with five components:
H |
standard entropy score for a 22-letter alphabet. |
H.10 |
entropy score for a 10-letter alphabet (see below). |
H.norm |
normalized entropy score (for 22-letter alphabet), so that conserved (low entropy) columns (or positions) score 1, and diverse (high entropy) columns score 0. |
H.10.norm |
normalized entropy score (for 10-letter alphabet), so that conserved (low entropy) columns score 1 and diverse (high entropy) columns score 0. |
freq |
residue frequency matrix containing percent occurrence values for each residue type. |
In addition to the standard entropy score (based on a 22-letter
alphabet of the 20 standard amino-acids, plus a gap character ‘-’
and a mask character ‘X’), an entropy score, H.10, based on
a 10-letter alphabet is also returned.
For H.10, residues from the 22-letter alphabet are classified
into one of 10 types, loosely following the convention of Mirny and
Shakhnovich (1999):
Hydrophobic/Aliphatic [V,I,L,M],
Aromatic [F,W,Y],
Ser/Thr [S,T],
Polar [N,Q],
Positive [H,K,R],
Negative [D,E],
Tiny [A,G],
Proline [P],
Cysteine [C], and
Gaps [-,X].
The residue code ‘X’ is useful for handling non-standard aminoacids.
Barry Grant
Grant, B.J. et al. (2006) Bioinformatics 22, 2695–2696.
Shannon (1948) The System Technical J. 27, 379–422.
Mirny and Shakhnovich (1999) J. Mol. Biol. 291, 177–196.
consensus, read.fasta
# Read HIV protease alignment
aln <- read.fasta(system.file("examples/hivp_xray.fa",package="bio3d"))
# Entropy and consensus
h <- entropy(aln)
con <- consensus(aln)
names(h$H)=con$seq
print(h$H)
# Entropy for sub-alignment (positions 1 to 20)
h.sub <- entropy(aln$ali[,1:20])
# Plot entropy and residue frequencies (excluding positions >=60 percent gaps)
H <- h$H.norm
H[ apply(h$freq[21:22,],2,sum)>=0.6 ] = 0
col <- mono.colors(32)
aa <- rev(rownames(h$freq))
oldpar <- par(no.readonly=TRUE)
layout(matrix(c(1,2),2,1,byrow = TRUE), widths = 7,
heights = c(2, 8), respect = FALSE)
# Plot 1: entropy
par(mar = c(0, 4, 2, 2))
barplot(H, border="white", ylab = "Entropy",
space=0, xlim=c(3.7, 97.3),yaxt="n" )
axis(side=2, at=c(0.2,0.4, 0.6, 0.8))
axis(side=3, at=(seq(0,length(con$seq),by=5)-0.5),
labels=seq(0,length(con$seq),by=5))
box()
# Plot2: residue frequencies
par(mar = c(5, 4, 0, 2))
image(x=1:ncol(con$freq),
y=1:nrow(con$freq),
z=as.matrix(rev(as.data.frame(t(con$freq)))),
col=col, yaxt="n", xaxt="n",
xlab="Alignment Position", ylab="Residue Type")
axis(side=1, at=seq(0,length(con$seq),by=5))
axis(side=2, at=c(1:22), labels=aa)
axis(side=3, at=c(1:length(con$seq)), labels =con$seq)
axis(side=4, at=c(1:22), labels=aa)
grid(length(con$seq), length(aa))
box()
for(i in 1:length(con$seq)) {
text(i, which(aa==con$seq[i]),con$seq[i],col="white")
}
abline(h=c(3.5, 4.5, 5.5, 3.5, 7.5, 9.5,
12.5, 14.5, 16.5, 19.5), col="gray")
par(oldpar)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.