df2bits: Calculate the information content expressed in bits of...
In Ni-Ar/niar: Nice Interactive Analysis with R

df2bits

R Documentation

Calculate the information content expressed in bits of sequences stored in a data.frame

Description

This function calculates the information content expressed in bits using the Shannon entropy. Check the details for full explanation and formulas. However, currently there's no support for LaTeX syntax for subscript text and fractions. To display them properly once could copy-paste the details section in Overleaf.

Usage

df2bits(
  data,
  ID_col,
  alphabet,
  small_n_correction = FALSE,
  long_format = FALSE,
  ignore_case = FALSE
)

Arguments

`data`	A data.frame with a minimum of 2 columns. One named `Sequence`, the other named as you prefer that will be specified with `ID_col`.
`ID_col`	The name of the column in `data` to be used as the identifier of the `Sequence` column.
`alphabet`	A character vector containing the alphabet letters present in `Sequence`. Guessed by default.
`small_n_correction`	Apply a small correction to the Shannon Entropy. See details. Default `FALSE`.
`long_format`	Logical. If `TRUE` reshape the bits into a tidy long data.frame format. Default `FALSE`.
`ignore_case`	Logical. If `TRUE` the length of the alphabet is calculated ignoring the case of the alphabet. Meaning that the maximum bits height will calculated on the case-insensitive length of the alphabet. See notes for more explanation. Default `FALSE`.

Details

Given an alphabet of letters of length W where every letter defined as l for which l belongs to W, we can represent the DNA alphabet as l' belongs to A,C,G,T where W = 4. With a multiple sequence alignment of N sequences of length I we denote the information content expressed in bits of the letter l at position i with bits_l_,_i we define the following formula

bits_l_,_i = R(l,i) \times ( log_2(W) - (H_i + \epsilon) )

where H_i is the Shannon entropy representing the uncertainty of position i is defined as:

-\sum_{i = 1}^{W} { p_l_i \times log_2 p_l_,_i }

where p_l_i is the relative frequency (a.k.a. probability) of letter l at position i; \epsilon is the approximation for small-sample corrections, i.e. a correction for an alignment of N sequences in the alignment defined as

\epsilon = \frac{1}{log_e{2}} \times \frac{W-1}{2N}

and R(l,i) sequences position probability matrix containing the p_l_i for N sequences.

Value

A data.frame or a tidy long format data.frame

Note

When having an upper and lower case DNA sequence, with an alphabet that as both 'ATGC' and 'atgc' one case force the maximum information content to log2(4) instead of log2(8) by doing ignore_case = TRUE.

Examples

df2bits(data, ID_col = 'Species', 
        alphabet = c('a', 'c', 'g', 't'), 
        small_n_correction = F, 
        long_format = T)

Ni-Ar/niar documentation built on Feb. 3, 2025, 9:25 a.m.

Ni-Ar/niar index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Ni-Ar/niar
Nice Interactive Analysis with R

df2bits: Calculate the information content expressed in bits of...
In Ni-Ar/niar: Nice Interactive Analysis with R

Calculate the information content expressed in bits of sequences stored in a data.frame

Description

Usage

Arguments

Details

Value

Note

Examples

Related to df2bits in Ni-Ar/niar...

R Package Documentation

Browse R Packages

We want your feedback!

Ni-Ar/niar Nice Interactive Analysis with R

df2bits: Calculate the information content expressed in bits of... In Ni-Ar/niar: Nice Interactive Analysis with R

Calculate the information content expressed in bits of sequences stored in a data.frame

Description

Usage

Arguments

Details

Value

Note

Examples

Related to df2bits in Ni-Ar/niar...

R Package Documentation

Browse R Packages

We want your feedback!

Ni-Ar/niar
Nice Interactive Analysis with R

df2bits: Calculate the information content expressed in bits of...
In Ni-Ar/niar: Nice Interactive Analysis with R