impute_mix: Imputation using a decision rule under an assumption of a...
In imp4p: Imputation for Proteomics

Description Usage Arguments Details Value Author(s) References Examples

This function allows imputing data sets with a MCAR-devoted algorithm and a MNAR-devoted algorithm using probabilities that missing values are MCAR. If such a probability is superior to a chosen threshold, then the MCAR-devoted algorithm is used, otherwise it is the MNAR-devoted algorithm. For details, see Giai Gianetto, Q. et al. (2020) (doi: doi: 10.1101/2020.05.29.122770).

impute.mix(tab, prob.MCAR, threshold, conditions, repbio=NULL, reptech=NULL,
methodMCAR="mle",nknn=15,weight=1, selec="all", ind.comp=1, progress.bar=TRUE, q=0.95,
ncp.max=5, maxiter = 10, ntree = 100, variablewise = FALSE, decreasing = FALSE,
verbose = FALSE, mtry = floor(sqrt(ncol(tab))), replace = TRUE,classwt = NULL,
cutoff = NULL, strata = NULL, sampsize = NULL, nodesize = NULL, maxnodes = NULL,
xtrue = NA, parallelize = c('no', 'variables', 'forests'),
methodMNAR="igcda", q.min = 0.025, q.norm = 3, eps = 0, distribution = "unif",
param1 = 3, param2 = 1, R.q.min=1)

`tab`	A data matrix containing numeric and missing values. Each column of this matrix is assumed to correspond to an experimental sample, and each row to an identified peptide.
`prob.MCAR`	A matrix of probabilities that each missing value is MCAR. For instance such a matrix can be obtained from the function `prob.mcar.tab` of this package.
`threshold`	A value such that if the probability that a missing value is MCAR is superior to it, then a MCAR-devoted algorithm is used, otherwise it is a MNAR-devoted algorithm that is used.
`conditions`	A vector of factors indicating the biological condition to which each column (experimental sample) belongs.
`repbio`	A vector of factors indicating the biological replicate to which each column belongs. Default is NULL (no experimental design is considered).
`reptech`	A vector of factors indicating the technical replicate to which each column belongs. Default is NULL (no experimental design is considered).
`methodMCAR`	The method used for imputing MCAR data. If `methodi="mle"` (default), then the `impute.mle` function is used (imputation using an EM algorithm). If `methodi="pca"`, then the `impute.PCA` function is used (imputation using Principal Component Analysis). If `methodi="rf"`, then the `impute.RF` function is used (imputation using Random Forest). Else, the `impute.slsa` function is used (imputation using Least Squares on nearest neighbours).
`methodMNAR`	The method used for imputing MNAR data. If `methodMNAR="igcda"` (default), then the `impute.igcda` function is used. Else, the `impute.pa` function is used.
`nknn`	The number of nearest neighbours used in the SLSA algorithm (see `impute.slsa`).
`weight`	The way of weighting in the algorithm (see `impute.slsa`).
`selec`	A parameter to select a part of the dataset to find nearest neighbours between rows. This can be useful for big data sets (see `impute.slsa`).
`ind.comp`	If `ind.comp=1`, only nearest neighbours without missing values are selected to fit linear models (see `impute.slsa`). Else, they can contain missing values.
`progress.bar`	If `TRUE`, a progress bar is displayed.
`q`	A quantile value (see `impute.igcda`).
`ncp.max`	parameter of the `impute.PCA` function.
`maxiter`	parameter of the `impute.RF` function.
`ntree`	parameter of the `impute.RF` function.
`variablewise`	parameter of the `impute.RF` function.
`decreasing`	parameter of the `impute.RF` function.
`verbose`	parameter of the `impute.RF` function.
`mtry`	parameter of the `impute.RF` function.
`replace`	parameter of the `impute.RF` function.
`classwt`	parameter of the `impute.RF` function.
`cutoff`	parameter of the `impute.RF` function.
`strata`	parameter of the `impute.RF` function.
`sampsize`	parameter of the `impute.RF` function.
`nodesize`	parameter of the `impute.RF` function.
`maxnodes`	parameter of the `impute.RF` function.
`xtrue`	parameter of the `impute.RF` function.
`parallelize`	parameter of the `impute.RF` function.
`q.min`	parameter of the `impute.pa` function.
`q.norm`	parameter of the `impute.pa` function.
`eps`	parameter of the `impute.pa` function.
`distribution`	parameter of the `impute.pa` function.
`param1`	parameter of the `impute.pa` function.
`param2`	parameter of the `impute.pa` function.
`R.q.min`	parameter of the `impute.pa` function.

The missing values for which prob.MCAR is superior to a chosen threshold are imputed with one of the MCAR-devoted imputation methods (impute.mle, impute.RF, impute.PCA or impute.slsa). The other missing values are considered MNAR and imputed with impute.igcda. More details and explanations can be bound in Giai Gianetto (2020).

The input matrix tab with imputed values instead of missing values.

Quentin Giai Gianetto <quentin2g@yahoo.fr>

Giai Gianetto, Q., Wieczorek S., Couté Y., Burger, T. (2020). A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data. bioRxiv 2020.05.29.122770; doi: doi: 10.1101/2020.05.29.122770

#Simulating data
res.sim=sim.data(nb.pept=2000,nb.miss=600);

#Fast imputation of missing values with the impute.rand algorithm
dat.rand=impute.rand(tab=res.sim$dat.obs,conditions=res.sim$condition);

#Estimation of the mixture model
res=estim.mix(tab=res.sim$dat.obs, tab.imp=dat.rand, conditions=res.sim$condition);

#Computing probabilities to be MCAR
born=estim.bound(tab=res.sim$dat.obs,conditions=res.sim$condition);
proba=prob.mcar.tab(born$tab.upper,res);

#Imputation under the assumption of MCAR and MNAR values
tabi=impute.mix(tab=res.sim$dat.obs, prob.MCAR=proba, threshold=0.5, conditions=res.sim$conditions,
repbio=res.sim$repbio, methodMCAR="slsa", methodMNAR="igcda", nknn=15, weight=1, selec="all",
ind.comp=1, progress.bar=TRUE);