pedmod_sqn: Optimize the Log Marginal Likelihood Using a Stochastic...

View source: R/opt.R

pedmod_sqnR Documentation

Optimize the Log Marginal Likelihood Using a Stochastic Quasi-Newton Method

Description

Optimizes eval_pedigree_ll and eval_pedigree_grad using a stochastic quasi-Newton method.

Usage

pedmod_sqn(
  ptr,
  par,
  maxvls,
  abs_eps,
  rel_eps,
  step_factor,
  n_it,
  n_grad_steps,
  indices = NULL,
  minvls = -1L,
  n_grad = 50L,
  n_hess = 500L,
  do_reorder = TRUE,
  use_aprx = FALSE,
  n_threads = 1L,
  cluster_weights = NULL,
  fix = NULL,
  standardized = FALSE,
  minvls_hess = minvls,
  maxvls_hess = maxvls,
  abs_eps_hess = abs_eps,
  rel_eps_hess = rel_eps,
  verbose = FALSE,
  method = 0L,
  check_every = 2L * n_grad_steps,
  use_tilting = FALSE,
  vls_scales = NULL
)

Arguments

ptr

object from pedigree_ll_terms.

par

starting values.

maxvls

maximum number of samples in the approximation for each marginal likelihood term.

abs_eps

absolute convergence threshold for eval_pedigree_ll and eval_pedigree_grad.

rel_eps

rel_eps convergence threshold for eval_pedigree_ll and eval_pedigree_grad.

step_factor

factor used for the step size. The step size is step_factor divided by the iteration number.

n_it

number of stochastic gradient steps to make.

n_grad_steps

number of stochastic gradient steps to make between each Hessian approximation update.

indices

zero-based vector with indices of which log marginal likelihood terms to include. Use NULL if all indices should be used.

minvls

minimum number of samples for each marginal likelihood term. Negative values provides a default which depends on the dimension of the integration.

n_grad

number of log marginal likelihood terms to include in the stochastic gradient step.

n_hess

number of log marginal likelihood terms to include in the gradients used for the Hessian approximation update. This is set to the entire sample (or indices) if this is greater than or equal to half the number of log marginal likelihood terms.

do_reorder

TRUE if a heuristic variable reordering should be used. TRUE is likely the best value.

use_aprx

TRUE if a less precise approximation of pnorm and qnorm should be used. This may reduce the computation time while not affecting the result much.

n_threads

number of threads to use.

cluster_weights

numeric vector with weights for each cluster. Use NULL if all clusters have weight one.

fix

integer vector with indices of par to fix. This is useful for computing profile likelihoods. NULL yields all parameters.

standardized

logical for whether to use the standardized or direct parameterization. See standardized_to_direct and the vignette at vignette("pedmod", package = "pedmod").

minvls_hess

minvls argument to use when updating the Hessian approximation.

maxvls_hess

maxvls argument to use when updating the Hessian approximation.

abs_eps_hess

abs_eps argument to use when updating the Hessian approximation.

rel_eps_hess

rel_eps argument to use when updating the Hessian approximation.

verbose

logical for whether to print output during the estimation.

method

integer with the method to use. Zero yields randomized Korobov lattice rules while one yields scrambled Sobol sequences.

check_every

integer for the number of gradient steps between checking that the likelihood did increase. If not, the iterations are reset and the step-size is halved.

use_tilting

TRUE if the minimax tilting method suggested by Botev (2017) should be used. See doi: 10.1111/rssb.12162.

vls_scales

can be a numeric vector with a positive scalar for each cluster. Then vls_scales[i] * minvls and vls_scales[i] * maxvls is used for cluster i rather than minvls and maxvls. Set vls_scales = NULL if the latter should be used.

Details

The function uses a stochastic quasi-Newton method like suggested by Byrd et al. (2016) with a few differences: Differences in gradients are used rather than Hessian-vector products, BFGS rather than L-BFGS is used because the problem is typically low dimensional, and damped BFGS updates are used (see e.g. chapter 18 of Nocedal and Wright, 2006).

Separate arguments for the gradient approximation in the Hessian update are provided as one may want a more precise approximation for these gradients. step_factor likely depends on the other parameters and the data set and should be altered.

Value

A list with the following elements:

par

estimated parameters.

omegas

parameter estimates after each iteration.

H

Hessian approximation in the quasi-Newton method. It should not be treated as the Hessian.

References

Byrd, R. H., Hansen, S. L., Nocedal, J., & Singer, Y. (2016). A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2), 1008-1031.

Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer Science & Business Media.

See Also

pedmod_opt and pedmod_start.

Examples


# we simulate outcomes with an additive genetic effect. The kinship matrix is
# the same for all families and given by
K <- matrix(c(
  0.5  , 0    , 0.25 , 0   , 0.25 , 0   , 0.125 , 0.125 , 0.125 , 0.125 ,
  0    , 0.5  , 0.25 , 0   , 0.25 , 0   , 0.125 , 0.125 , 0.125 , 0.125 ,
  0.25 , 0.25 , 0.5  , 0   , 0.25 , 0   , 0.25  , 0.25  , 0.125 , 0.125 ,
  0    , 0    , 0    , 0.5 , 0    , 0   , 0.25  , 0.25  , 0     , 0     ,
  0.25 , 0.25 , 0.25 , 0   , 0.5  , 0   , 0.125 , 0.125 , 0.25  , 0.25  ,
  0    , 0    , 0    , 0   , 0    , 0.5 , 0     , 0     , 0.25  , 0.25  ,
  0.125, 0.125, 0.25 , 0.25, 0.125, 0   , 0.5   , 0.25  , 0.0625, 0.0625,
  0.125, 0.125, 0.25 , 0.25, 0.125, 0   , 0.25  , 0.5   , 0.0625, 0.0625,
  0.125, 0.125, 0.125, 0   , 0.25 , 0.25, 0.0625, 0.0625, 0.5   , 0.25  ,
  0.125, 0.125, 0.125, 0   , 0.25 , 0.25, 0.0625, 0.0625, 0.25  , 0.5
), 10)

# simulates a data set.
#
# Args:
#   n_fams: number of families.
#   beta: the fixed effect coefficients.
#   sig_sq: the scale parameter.
sim_dat <- function(n_fams, beta = c(-1, 1, 2), sig_sq = 3){
  # setup before the simulations
  Cmat <- 2 * K
  n_obs <- NROW(K)
  Sig <- diag(n_obs) + sig_sq * Cmat
  Sig_chol <- chol(Sig)

  # simulate the data
  out <- replicate(
    n_fams, {
      # simulate covariates
      X <- cbind(`(Intercept)` = 1, Continuous = rnorm(n_obs),
                 Binary = runif(n_obs) > .5)

      # assign the linear predictor + noise
      eta <- drop(X %*% beta) + drop(rnorm(n_obs) %*% Sig_chol)

      # return the list in the format needed for the package
      list(y = as.numeric(eta > 0), X = X, scale_mats = list(Cmat))
    }, simplify = FALSE)

  # add attributes with the true values and return
  attributes(out) <- list(beta = beta, sig_sq = sig_sq)
  out
}

# simulate the data
set.seed(1)
dat <- sim_dat(100L)

# fit the model
ptr <- pedigree_ll_terms(dat, max_threads = 1L)
start <- pedmod_start(ptr = ptr, data = dat, n_threads = 1L)
fit <- pedmod_sqn(ptr = ptr, par = start$par, n_threads = 1L, use_aprx = TRUE,
                  maxvls = 5000L, minvls = 1000L, abs_eps = 0, rel_eps = 1e-3,
                  n_grad_steps = 20L, step_factor = 1, n_grad = 10L,
                  n_hess = 50L, check_every = 50L, n_it = 1000L)
fit$par # maximum likelihood estimate
# the maximum likelihood
eval_pedigree_ll(ptr = ptr, fit$par, maxvls = 5000L, abs_eps = 0,
                 rel_eps = 1e-3, minvls = 1000L)


pedmod documentation built on Sept. 11, 2022, 5:05 p.m.

Related to pedmod_sqn in pedmod...