# mip: Mutual Information product (MIP) function In Bios2cor: From Biological Sequences and Simulations to Correlation Analysis

## Description

Calculates a corrected mutual information score (MIP), by substraction of the average product from the probability of joint occurrence of events.

## Usage

 1  mip(align, gap_ratio = 0.2) 

## Arguments

 align An object of class 'align' created by the import.msf or the import.fasta function from a sequence alignment gap_ratio Numeric value between 0 and 1 indicating the maximal gap ratio at a given position in the MSA for this position to be taken into account. Default is 0.2, positions with more than 20 percent of gaps will not be taken into account in the analysis. When gap_ratio is 1 or close to 1, only positions with at least 1 aa are taken into account (positions with only gaps are excluded).

## Details

The MIp score at position [i,j] has been computed with the following formula :

{MIp(i,j)} = MI(i,j) - \frac{MI(i,\bar{j})MI(\bar{i},j)}{<MI>}

with :

• {MI(i,j) = ∑_{x,y}^{ } p_{x,y}(i,j) ln\frac{p_{x,y}(i,j)}{p_{x}(i)p_{y}(j)}}

• MI(i,\bar{j}) = \frac{1}{n-1} ∑_{j \neq i}^{ } MI(i,j)

• MI(\bar{i},j) = \frac{1}{n-1} ∑_{i \neq j}^{ } MI(i,j)

• <MI> = \frac{2}{n(n-1)} ∑_{i,j}^{ }MI(i,j)

and where p_{x,y}(i,j) is the frequency of the amino acid pair (x,y) at positions i and j.

N.B. this formula has been widely applied in the field of sequence correlation/covariation but favors pairs with high entropy.

## Value

A list of two elements which are numerical matrices containing the MIP scores and Z-scores for each pair of elements.

## Author(s)

 1 2 3 4 5  #Importing MSA file align <- import.fasta(system.file("msa/toy2_align.fa", package = "Bios2cor")) #Creating correlation object with MIP method for positions with gap ratio < 0.2 (Default) mip <- mip(align)