mdgc_fit: Estimate the Model Parameters
In mdgc: Missing Data Imputation Using Gaussian Copulas

View source: R/mdgc.R

mdgc_fit

R Documentation

Estimate the Model Parameters

Description

Estimates the covariance matrix and the non-zero mean terms. The lr parameter and the batch_size parameter are likely data dependent. Convergence should be monitored e.g. by using verbose = TRUE with method = "svrg".

See the README at https://github.com/boennecd/mdgc for examples.

Usage

mdgc_fit(
  ptr,
  vcov,
  mea,
  lr = 0.001,
  rel_eps = 0.001,
  maxit = 25L,
  batch_size = NULL,
  method = c("svrg", "adam", "aug_Lagran"),
  seed = 1L,
  epsilon = 1e-08,
  beta_1 = 0.9,
  beta_2 = 0.999,
  n_threads = 1L,
  do_reorder = TRUE,
  abs_eps = -1,
  maxpts = 10000L,
  minvls = 100L,
  verbose = FALSE,
  decay = 0.98,
  conv_crit = 1e-06,
  use_aprx = FALSE,
  mu = 1,
  lambda = NULL
)

Arguments

`ptr`	returned object from `get_mdgc_log_ml`.
`vcov, mea`	starting value for the covariance matrix and the non-zero mean entries.
`lr`	learning rate.
`rel_eps`	relative error for each marginal likelihood factor.
`maxit`	maximum number of iteration.
`batch_size`	number of observations in each batch.
`method`	estimation method to use. Can be `"svrg"`, `"adam"`, or `"aug_Lagran"`.
`seed`	fixed seed to use. Use `NULL` if the seed should not be fixed.
`epsilon, beta_1, beta_2`	ADAM parameters.
`n_threads`	number of threads to use.
`do_reorder`	logical for whether to use a heuristic variable reordering. `TRUE` is likely the best option.
`abs_eps`	absolute convergence threshold for each marginal likelihood factor.
`maxpts`	maximum number of samples to draw for each marginal likelihood term.
`minvls`	minimum number of samples.
`verbose`	logical for whether to print output during the estimation.
`decay`	the learning rate used by SVRG is given by `lr * decay^iteration_number`.
`conv_crit`	relative convergence threshold.
`use_aprx`	logical for whether to use an approximation of `pnorm` and `qnorm`. This may yield a noticeable reduction in the computation time.
`mu`	starting value for the penalty in the augmented Lagrangian method.
`lambda`	starting values for the Lagrange multiplier estimates. `NULL` yields a default.

Value

An list with the following elements:

`result`	`list` with two elements: `vcov` is the estimated covariance matrix and `mea` is the estimated non-zero mean terms.
`estimates`	If present, the estimated parameters after each iteration.
`fun_vals`	If present, the output of `mdgc_log_ml` after each iteration.
`mu,lambda`	If present, the `mu` and `lambda` values at the end.

The elements that may be present depending on the chosen method.

References

Kingma, D.P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. abs/1412.6980.

Johnson, R., & Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In Advances in neural information processing systems.

Examples


# there is a bug on CRAN's check on Solaris which I have failed to reproduce.
# See https://github.com/r-hub/solarischeck/issues/8#issuecomment-796735501.
# Thus, this example is not run on Solaris
is_solaris <- tolower(Sys.info()[["sysname"]]) == "sunos"

if(!is_solaris){
  # randomly mask data
  set.seed(11)
  masked_data <- iris
  masked_data[matrix(runif(prod(dim(iris))) < .10, NROW(iris))] <- NA

  # use the functions in the package
  library(mdgc)
  obj <- get_mdgc(masked_data)
  ptr <- get_mdgc_log_ml(obj)
  start_vals <- mdgc_start_value(obj)

  fit <- mdgc_fit(ptr, start_vals, obj$means, rel_eps = 1e-2, maxpts = 10000L,
                  minvls = 1000L, use_aprx = TRUE, batch_size = 100L, lr = .001,
                  maxit = 100L, n_threads = 2L)
  print(fit$result$vcov)
  print(fit$result$mea)
}

mdgc documentation built on May 31, 2023, 7:31 p.m.