# imputeDataMFA: Impute Missing Rows and Estimates MFA Axes In GonzalezIgnacio/missRows: Handling Missing Individuals in Multi-Omics Data Integration

## Description

Impute the missing rows of data tables using the alternating least squares algorithm used in PCA. This function is internally called by MIMFA and is not usually called directly by a user.

## Usage

 1 imputeDataMFA(datasets, U, missRows, comp, maxIter=500, tol=1e-10) 

## Arguments

 datasets a list containing the data tables with missing rows. Tables in the list should be arranged in samples x variables, with samples order matching in all data tables. U the compromise configuration, a matrix with the individuals coordinates as returned by STATIS function. missRows a list containing character vectors with the name of the missing individuals (rows) per table. comp a number of components kept for imputation. maxIter integer, maximum number of iterations for the iterative algorithm. tol positive value, the threshold for assessing convergence.

## Details

Since the core of MFA is a PCA of the merged data tables K, the algorithm suggested to estimate MFA axes and impute missing values is inspired from the alternating least squares algorithm used in PCA. This consists in finding matrices F and U which minimize the following criterion:

||K-M-FU||^2 = ∑_{i}∑_{k}( K_{ik} - M_{ik} - ∑_{d=1}^D F_{id} U_{kd})^2,

where M is a matrix with each row equal to a vector of the mean of each variable and D is the kept dimensions in PCA. The solution is obtained by alternating two multiple regressions until convergence, one for estimating axes (loadings \hat{U}) and one for components (scores \hat{F}):

\hat{U}' = (\hat{F}'\hat{F})^{-1}\hat{F}'(K - \hat{M})

\hat{F} = (K - \hat{M})\hat{U}(\hat{U}'\hat{U})^{-1}.

The imputeDataMFA algorithm first consists in imputing missing values in K with initial values (the column means on the non-missing entries), then \hat{M} is computed. The second step of the iterative algorithm is to calculate \hat{F} = (K - \hat{M})U(U'U)^{-1} on the completed dataset by using D components of U. Missing values are estimated as \hat{K} = \hat{M} + \hat{F}U'. The new imputed data set K is obtained by replacing the missing values of the original K matrix with the corresponding elements of \hat{K}, whilst keeping the observed values unaltered. These steps of estimation of the parameters and imputation of the missing values are iterate until convergence. The number D of components used in the algorithm can be estimated setting the estim.ncp argument to TRUE in the function MIMFA.

## Value

A list containing components with the imputed rows for each data table.

## Author(s)

Ignacio GonzÃ¡lez

GonzalezIgnacio/missRows documentation built on Jan. 16, 2020, 4:11 a.m.