impute_GC: Gaussian copula for incomplete mixed data

View source: R/impute_mixedgc.R

impute_GCR Documentation

Gaussian copula for incomplete mixed data

Description

Fit a Gaussian copula model from (continuous and ordinal) mixed data and impute the missing entries using the fitted model

Usage

impute_GC(
  X,
  nlevels = 20,
  trunc_method = "Iterative",
  n_sample = 5000,
  n_update = 1,
  maxit = 50,
  eps = 0.01,
  verbose = FALSE,
  runiter = 0,
  n_MI = 0,
  corr = NULL,
  ...
)

Arguments

X

A matrix or data.frame with missing values. Observed entry of X should either be numerical value or numerical ordinal level. Make sure there is no empty row nor character level in X.

nlevels

A column which has larger number of unique values than nlevels will be classfied as continuous, otherwise ordinal.

trunc_method

Method for evaluating truncated normal moments: 'Iterative' or 'Sampling'.

n_sample

Number of MC samples, only used when trunc_method is 'Sampling'

n_update

The number of updates, only used when trunc_method is 'Iterative'

maxit

Maximum number of iterations

eps

Convergence threshold

verbose

Whether to print progress information

runiter

When set as a positive integer, the algorithm will run the specified number of iterations exactly.

n_MI

The number of random samples to draw from the missing distribution.

corr

If not NULL, impute missing values using corr as the copula correlation

...

Additional arguments for development use

Details

Impute the missing entries of continuous and ordinal mixed data by fitting a Gaussian copula model to the data.

Value

A list containing:

Ximp

Imputed data matrix

corr

Fitted copula correlation matrix

loglik

The log-likelihood achieved during iteration. This value approximates the true objective function we want to maximize, which is hard to evaluate. Monotonically increasing loglik sequence indicates good fit

Author(s)

Yuxuan Zhao, yz2295@cornell.edu and Madeleine Udell, udell@cornell.edu

References

Zhao, Y., & Udell, M. (2020). Missing Value Imputation for Mixed Data via Gaussian Copula. KDD 2020

Examples

# Simulate Data
library(MASS)
# Generate 15-dim mixed data and mask 10% observation
var_types = list('cont'=1:5, 'ord'=6:10, 'bin'=11:15)
X = generate_mixed_from_gc(var_types = var_types, n = 500)
Xmask = mask_MCAR(X, mask_fraction = 0.2)
# Fit Gaussian copula
fit = impute_mixedgc(Xmask, verbose = TRUE)

# Compute imputation Error
cal_mae_scaled(xhat = fit$Ximp, xobs = Xmask, xtrue = X)


udellgroup/mixedgcImp documentation built on Jan. 25, 2023, 7:55 p.m.