Imputation of data sets containing peptide intensities with a multiple imputation strategy.

Description

This function allows imputing data sets containing peptide intensities with a multiple imputation strategy.

Usage

1
2
3
4
impute.mi(tab, conditions, repbio=NULL, reptech=NULL, nb.iter=3, nknn=15, selec=600, 
siz=500, weight=1, ind.comp=1, progress.bar=TRUE, x.min=20, x.max=30, x.step.mod=300, 
x.step.pi=300, nb.rei=100, method=4, gridsize=300, q=0.95, q.min=0, q.norm=3, 
eps=2, methodi="slsa");

Arguments

tab

A data matrix containing only numeric and missing values. Each column of this matrix is assumed to correspond to an experimental sample, and each row to an identified peptide.

conditions

A vector of factors indicating the biological condition to which each column (experimental sample) belongs.

repbio

A vector of factors indicating the biological replicate to which each column belongs. Default is NULL (no experimental design is considered).

reptech

A vector of factors indicating the technical replicate to which each column belongs. Default is NULL (no experimental design is considered).

nb.iter

The number of iterations used for the multiple imputation method (see mi.mix).

methodi

The method used for imputing data. If methodi="mle", then the MLE algorithm is used (function impute.wrapper.MLE of the R package imputeLCMD), else the SLSA algorithm is used (default). (see mi.mix)

nknn

The number of nearest neighbours used in the SLSA algorithm (see impute.slsa).

selec

A parameter to select a part of the dataset to find nearest neighbours between rows. This can be useful for big data sets (see impute.slsa).

siz

A parameter to select a part of the dataset to perform imputations with the SLSA algorithm or the MLE algorithm. This can be useful for big data sets (see mi.mix).

weight

The way of weighting in the algorithm (see impute.slsa).

ind.comp

If ind.comp=1, only nearest neighbours without missing values are selected to fit linear models (see impute.slsa). Else, they can contain missing values.

progress.bar

If TRUE, a progress bar is displayed.

x.min

The lower bound of the interval used for estimating the cumulative distribution functions of the mixing model in each column (see estim.mix).

x.max

The upper bound of the interval used for estimating the cumulative distribution functions of the mixing model in each column (see estim.mix).

x.step.mod

The number of points in the intervals used for estimating the cumulative distribution functions of the mixing model in each column (see estim.mix).

x.step.pi

The number of points in the intervals used for estimating the proportion of MCAR values in each column (see estim.mix).

nb.rei

The number of initializations of the minimization algorithm used to estimate the proportion of MCAR values (see Details) (see estim.mix).

method

A numeric value indicating the method to use for estimating the proportion of MCAR values (see estim.mix).

gridsize

A numeric value indicating the number of possible choices used for estimating the proportion of MCAR values with the method of Patra and Sen (2015) (see estim.mix).

q

A quantile value (see impute.igcda).

q.min

A quantile value of the observed values allowing defining the maximal value which can be generated. Default is 0 (the maximal value is the minimum of observed values minus eps) (see impute.pa).

q.norm

A quantile value of a normal distribution allowing defining the minimal value which can be generated. Default is 3 (the minimal value is the maximal value minus qn*median(sd(observed values)) where sd is the standard deviation of a row in a condition) (see impute.pa).

eps

A value allowing defining the maximal value which can be generated. Default is 2 (see impute.pa).

Details

First, a mixture model of MCAR and MNAR values is estimated in each column of tab. This model is used to estimate probabilities that each missing value is MCAR. Then, these probabilities are used to perform a multiple imputation strategy (see mi.mix). Rows with no value in a condition are imputed using the impute.pa function.

Value

The input matrix tab with imputed values instead of missing values.

Author(s)

Quentin Giai Gianetto <quentin2g@yahoo.fr>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#Simulating data
res.sim=sim.data(nb.pept=2000,nb.miss=600,pi.mcar=0.2,para=10,nb.cond=2,nb.repbio=3,
nb.sample=5,m.c=25,sd.c=2,sd.rb=0.5,sd.r=0.2);

#Imputation of the dataset noting the conditions to which the samples belong.
result=impute.mi(tab=res.sim$dat.obs, conditions=res.sim$conditions);

#Imputation of the dataset noting the conditions to which the samples belong 
#and also their biological replicates.
result=impute.mi(tab=res.sim$dat.obs, conditions=res.sim$conditions, repbio=res.sim$repbio);

#For large data sets, the imputation can be accelerated thanks to the selec parameter 
#and the siz parameter (see impute.slsa and mi.mix)
#but it may result in a less accurate data imputation. Note that selec has to be greater than siz.
#
#Here, nb.iter is fixed to 3
result1=impute.mi(tab=res.sim$dat.obs, conditions=res.sim$conditions, progress.bar=TRUE, 
selec=400, siz=300, nb.iter=3);

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.