# Imputation of data sets containing peptide intensities with a multiple imputation strategy.

### Description

This function allows imputing data sets containing peptide intensities with a multiple imputation strategy.

### Usage

1 2 3 4 |

### Arguments

`tab` |
A data matrix containing only numeric and missing values. Each column of this matrix is assumed to correspond to an experimental sample, and each row to an identified peptide. |

`conditions` |
A vector of factors indicating the biological condition to which each column (experimental sample) belongs. |

`repbio` |
A vector of factors indicating the biological replicate to which each column belongs. Default is NULL (no experimental design is considered). |

`reptech` |
A vector of factors indicating the technical replicate to which each column belongs. Default is NULL (no experimental design is considered). |

`nb.iter` |
The number of iterations used for the multiple imputation method (see |

`methodi` |
The method used for imputing data. If |

`nknn` |
The number of nearest neighbours used in the SLSA algorithm (see |

`selec` |
A parameter to select a part of the dataset to find nearest neighbours between rows. This can be useful for big data sets (see |

`siz` |
A parameter to select a part of the dataset to perform imputations with the SLSA algorithm or the MLE algorithm. This can be useful for big data sets (see |

`weight` |
The way of weighting in the algorithm (see |

`ind.comp` |
If |

`progress.bar` |
If |

`x.min` |
The lower bound of the interval used for estimating the cumulative distribution functions of the mixing model in each column (see |

`x.max` |
The upper bound of the interval used for estimating the cumulative distribution functions of the mixing model in each column (see |

`x.step.mod` |
The number of points in the intervals used for estimating the cumulative distribution functions of the mixing model in each column (see |

`x.step.pi` |
The number of points in the intervals used for estimating the proportion of MCAR values in each column (see |

`nb.rei` |
The number of initializations of the minimization algorithm used to estimate the proportion of MCAR values (see Details) (see |

`method` |
A numeric value indicating the method to use for estimating the proportion of MCAR values (see |

`gridsize` |
A numeric value indicating the number of possible choices used for estimating the proportion of MCAR values with the method of Patra and Sen (2015) (see |

`q` |
A quantile value (see |

`q.min` |
A quantile value of the observed values allowing defining the maximal value which can be generated. Default is 0 (the maximal value is the minimum of observed values minus |

`q.norm` |
A quantile value of a normal distribution allowing defining the minimal value which can be generated. Default is 3 (the minimal value is the maximal value minus qn*median(sd(observed values)) where sd is the standard deviation of a row in a condition) (see |

`eps` |
A value allowing defining the maximal value which can be generated. Default is 2 (see |

### Details

First, a mixture model of MCAR and MNAR values is estimated in each column of `tab`

. This model is used to estimate probabilities that each missing value is MCAR. Then, these probabilities are used to perform a multiple imputation strategy (see `mi.mix`

). Rows with no value in a condition are imputed using the `impute.pa`

function.

### Value

The input matrix `tab`

with imputed values instead of missing values.

### Author(s)

Quentin Giai Gianetto <quentin2g@yahoo.fr>

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ```
#Simulating data
res.sim=sim.data(nb.pept=2000,nb.miss=600,pi.mcar=0.2,para=10,nb.cond=2,nb.repbio=3,
nb.sample=5,m.c=25,sd.c=2,sd.rb=0.5,sd.r=0.2);
#Imputation of the dataset noting the conditions to which the samples belong.
result=impute.mi(tab=res.sim$dat.obs, conditions=res.sim$conditions);
#Imputation of the dataset noting the conditions to which the samples belong
#and also their biological replicates.
result=impute.mi(tab=res.sim$dat.obs, conditions=res.sim$conditions, repbio=res.sim$repbio);
#For large data sets, the imputation can be accelerated thanks to the selec parameter
#and the siz parameter (see impute.slsa and mi.mix)
#but it may result in a less accurate data imputation. Note that selec has to be greater than siz.
#
#Here, nb.iter is fixed to 3
result1=impute.mi(tab=res.sim$dat.obs, conditions=res.sim$conditions, progress.bar=TRUE,
selec=400, siz=300, nb.iter=3);
``` |