df2bits | R Documentation |
This function calculates the information content expressed in bits using the Shannon entropy. Check the details for full explanation and formulas. However, currently there's no support for LaTeX syntax for subscript text and fractions. To display them properly once could copy-paste the details section in Overleaf.
df2bits(
data,
ID_col,
alphabet,
small_n_correction = FALSE,
long_format = FALSE,
ignore_case = FALSE
)
data |
A data.frame with a minimum of 2 columns. One named |
ID_col |
The name of the column in |
alphabet |
A character vector containing the alphabet letters present in |
small_n_correction |
Apply a small correction to the Shannon Entropy. See details. Default |
long_format |
Logical. If |
ignore_case |
Logical. If |
Given an alphabet of letters of length W where every letter
defined as l for which l belongs to W,
we can represent the DNA alphabet as l'
belongs to A,C,G,T where W = 4.
With a multiple sequence alignment of N
sequences of length I we denote
the information content expressed in bits of the letter l at position i
with bits_l_,_i
we define the following
formula
bits_l_,_i = R(l,i) \times ( log_2(W) - (H_i + \epsilon) )
where H_i
is the Shannon entropy representing the uncertainty of
position i is defined as:
-\sum_{i = 1}^{W} { p_l_i \times log_2 p_l_,_i }
where p_l_i
is the
relative frequency (a.k.a. probability) of letter l
at position i
;
\epsilon
is the approximation for small-sample corrections,
i.e. a correction for an alignment of N sequences in the alignment defined as
\epsilon = \frac{1}{log_e{2}} \times \frac{W-1}{2N}
and R(l,i)
sequences
position probability matrix containing the p_l_i
for N sequences.
A data.frame or a tidy long format data.frame
When having an upper and lower case DNA sequence, with an alphabet
that
as both 'ATGC' and 'atgc' one case force the maximum information
content to log2(4)
instead of log2(8)
by doing ignore_case = TRUE
.
df2bits(data, ID_col = 'Species',
alphabet = c('a', 'c', 'g', 't'),
small_n_correction = F,
long_format = T)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.