impute_LRGC: Low rank Gaussian copula for incomplete mixed data

View source: R/impute_mixedgc.R

impute_LRGCR Documentation

Low rank Gaussian copula for incomplete mixed data

Description

Fit a low rank Gaussian copula model from (continuous and ordinal) mixed data and impute the missing entries using the fitted model

Usage

impute_LRGC(
  X,
  rank,
  nlevels = 20,
  trunc_method = "Iterative",
  n_sample = 5000,
  n_update = 1,
  maxit = 50,
  eps = 0.01,
  verbose = FALSE,
  runiter = 0,
  ...
)

Arguments

X

A matrix or data.frame with missing values. Observed entry of X should either be numerical value or numerical ordinal level. Make sure there is no empty row nor character level in X.

rank

The rank, i.e. number of latent factors

nlevels

A column which has larger number of unique values than nlevels will be classfied as continuous, otherwise ordinal.

trunc_method

Method for evaluating truncated normal moments: 'Iterative' or 'Sampling'.

n_sample

Number of MC samples, only used when trunc_method is 'Sampling'

n_update

The number of updates, only used when trunc_method is 'Iterative'

maxit

Maximum number of iterations

eps

Convergence threshold

verbose

Whether to print progress information

runiter

When set as a positive integer, the algorithm will run the specified number of iterations exactly.

...

Additional arguments for development use

Details

Impute the missing entries of continuous and ordinal mixed data by fitting a low rank Gaussian copula (LRGC) model to the data. LRGC is a subclass of Gaussian copula: it requires the copula correlation matrix to have a low rank plus diagonal decomposition: Σ = WW^\top + σ^2 \mathrm{I}_p where W\in \mathbb{R}\times {p\times k} and k<p.

Value

A list containing:

Ximp

Imputed data matrix

W

Fitted latent low rank subspace matrix

sigma

Fitted noise variance

loglik

The log-likelihood achieved during iteration. This value approximates the true objective function we want to maximize, which is hard to evaluate. Monotonically increasing loglik sequence indicates good fit

Zimp

The imputed Z matrix. On observed ordinal entries, the entry is the corresponding estimated conditional mean. Useful for constructing confidence intervals.

C

The conditional variance corresponding to the observed Z matrix. Useful for quantifying imputation uncertainty.

cutoffs

The estimated cutoffs for ordinal dimensions. Useful for quantifying imputation uncertainty.

Author(s)

Yuxuan Zhao, yz2295@cornell.edu and Madeleine Udell, udell@cornell.edu

References

Zhao, Y., & Udell, M. (2020). Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula. NeurIPS 2020.


udellgroup/mixedgcImp documentation built on Jan. 25, 2023, 7:55 p.m.