mdgc_fit: Estimate the Model Parameters

View source: R/mdgc.R

mdgc_fitR Documentation

Estimate the Model Parameters

Description

Estimates the covariance matrix and the non-zero mean terms. The lr parameter and the batch_size parameter are likely data dependent. Convergence should be monitored e.g. by using verbose = TRUE with method = "svrg".

See the README at https://github.com/boennecd/mdgc for examples.

Usage

mdgc_fit(
  ptr,
  vcov,
  mea,
  lr = 0.001,
  rel_eps = 0.001,
  maxit = 25L,
  batch_size = NULL,
  method = c("svrg", "adam", "aug_Lagran"),
  seed = 1L,
  epsilon = 1e-08,
  beta_1 = 0.9,
  beta_2 = 0.999,
  n_threads = 1L,
  do_reorder = TRUE,
  abs_eps = -1,
  maxpts = 10000L,
  minvls = 100L,
  verbose = FALSE,
  decay = 0.98,
  conv_crit = 1e-06,
  use_aprx = FALSE,
  mu = 1,
  lambda = NULL
)

Arguments

ptr

returned object from get_mdgc_log_ml.

vcov, mea

starting value for the covariance matrix and the non-zero mean entries.

lr

learning rate.

rel_eps

relative error for each marginal likelihood factor.

maxit

maximum number of iteration.

batch_size

number of observations in each batch.

method

estimation method to use. Can be "svrg", "adam", or "aug_Lagran".

seed

fixed seed to use. Use NULL if the seed should not be fixed.

epsilon, beta_1, beta_2

ADAM parameters.

n_threads

number of threads to use.

do_reorder

logical for whether to use a heuristic variable reordering. TRUE is likely the best option.

abs_eps

absolute convergence threshold for each marginal likelihood factor.

maxpts

maximum number of samples to draw for each marginal likelihood term.

minvls

minimum number of samples.

verbose

logical for whether to print output during the estimation.

decay

the learning rate used by SVRG is given by lr * decay^iteration_number.

conv_crit

relative convergence threshold.

use_aprx

logical for whether to use an approximation of pnorm and qnorm. This may yield a noticeable reduction in the computation time.

mu

starting value for the penalty in the augmented Lagrangian method.

lambda

starting values for the Lagrange multiplier estimates. NULL yields a default.

Value

An list with the following elements:

result

list with two elements: vcov is the estimated covariance matrix and mea is the estimated non-zero mean terms.

estimates

If present, the estimated parameters after each iteration.

fun_vals

If present, the output of mdgc_log_ml after each iteration.

mu,lambda

If present, the mu and lambda values at the end.

The elements that may be present depending on the chosen method.

References

Kingma, D.P., & Ba, J. (2015). Adam: A Method for Stochastic Optimization. abs/1412.6980.

Johnson, R., & Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In Advances in neural information processing systems.

See Also

mdgc_log_ml, mdgc_start_value, mdgc_impute.

Examples


# there is a bug on CRAN's check on Solaris which I have failed to reproduce.
# See https://github.com/r-hub/solarischeck/issues/8#issuecomment-796735501.
# Thus, this example is not run on Solaris
is_solaris <- tolower(Sys.info()[["sysname"]]) == "sunos"

if(!is_solaris){
  # randomly mask data
  set.seed(11)
  masked_data <- iris
  masked_data[matrix(runif(prod(dim(iris))) < .10, NROW(iris))] <- NA

  # use the functions in the package
  library(mdgc)
  obj <- get_mdgc(masked_data)
  ptr <- get_mdgc_log_ml(obj)
  start_vals <- mdgc_start_value(obj)

  fit <- mdgc_fit(ptr, start_vals, obj$means, rel_eps = 1e-2, maxpts = 10000L,
                  minvls = 1000L, use_aprx = TRUE, batch_size = 100L, lr = .001,
                  maxit = 100L, n_threads = 2L)
  print(fit$result$vcov)
  print(fit$result$mea)
}



mdgc documentation built on May 31, 2023, 7:31 p.m.