SQN: SQN guided optimizer

Description Usage Arguments Value References See Also Examples

View source: R/optimizers_guided.R

Description

Optimizes an empirical (convex) loss function over batches of sample data.

Usage

1
2
3
4
5
6
SQN(x0, grad_fun, hess_vec_fun = NULL, pred_fun = NULL,
  initial_step = 0.001, step_fun = function(iter) 1/sqrt((iter/10) +
  1), callback_iter = NULL, args_cb = NULL, verbose = TRUE,
  mem_size = 10, bfgs_upd_freq = 20, min_curvature = 1e-04,
  y_reg = NULL, use_grad_diff = FALSE, check_nan = TRUE,
  nthreads = -1)

Arguments

x0

Initial values for the variables to optimize.

grad_fun

Function taking as unnamed arguments 'x_curr' (variable values), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the gradient when evalauted on that data.

hess_vec_fun

Function taking as unnamed arguments 'x_curr' (variable values), 'vec' (numeric vector), 'X' (covariates), 'y' (target variable), and 'w' (weights), plus additional arguments ('...'), and producing the expected value of the Hessian (with variable values at 'x_curr') when evalauted on that data, multiplied by the vector 'vec'. Not required when using 'use_grad_diff' = 'TRUE'.

pred_fun

Function taking an unnamed argument as data, another unnamed argument as the variable values, and optional extra arguments ('...'). Will be called when using 'predict' on the object returned by this function.

initial_step

Initial step size.

step_fun

Function accepting the iteration number as an unnamed parameter, which will output the number by which 'initial_step' will be multiplied at each iteration to get the step size for that iteration.

callback_iter

Callback function which will be called at the end of each iteration. Will pass three unnamed arguments: the current variable values, the current iteration number, and 'args_cb'. Pass 'NULL' if there is no need to call a callback function.

args_cb

Extra argument to pass to the callback function.

verbose

Whether to print information about iteration statuses when something goes wrong.

mem_size

Number of correction pairs to store for approximation of Hessian-vector products.

bfgs_upd_freq

Number of iterations (batches) after which to generate a BFGS correction pair.

min_curvature

Minimum value of (s * y) / (s * s) in order to accept a correction pair. Pass 'NULL' for no minimum.

y_reg

Regularizer for 'y' vector (gets added y_reg * s). Pass 'NULL' for no regularization.

use_grad_diff

Whether to create the correction pairs using differences between gradients instead of Hessian-vector products. These gradients are calculated on a larger batch than the regular ones (given by batch_size * bfgs_upd_freq).

check_nan

Whether to check for variables becoming NaN after each iteration, and reverting the step if they do (will also reset BFGS memory).

nthreads

Number of parallel threads to use. If set to -1, will determine the number of available threads and use all of them. Note however that not all the computations can be parallelized, and the BLAS backend might use a different number of threads.

Value

an 'SQN' object with the user-supplied functions, which can be fit to batches of data through function 'partial_fit', and can produce predictions on new data through function 'predict'.

References

See Also

partial_fit , predict.stochQN_guided , SQN_free

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
### Example logistic regression with randomly-generated data
library(stochQN)

### Will sample data y ~ Bernoulli(sigm(Ax))
true_coefs <- c(1.12, 5.34, -6.123)

generate_data_batch <- function(true_coefs, n = 100) {
  X <- matrix(rnorm(length(true_coefs) * n), nrow=n, ncol=length(true_coefs))
  y <- 1 / (1 + exp(-as.numeric(X %*% true_coefs)))
  y <- as.numeric(y >= runif(n))
  return(list(X = X, y = y))
}

### Logistic regression likelihood/loss
eval_fun <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred    <- 1 / (1 + exp(-as.numeric(X %*% coefs)))
  logloss <- mean(-(y * log(pred) + (1 - y) * log(1 - pred)))
  reg     <- lambda * as.numeric(coefs %*% coefs)
  return(logloss + reg)
}

eval_grad <- function(coefs, X, y, weights=NULL, lambda=1e-5) {
  pred <- 1 / (1 + exp(-(X %*% coefs)))
  grad <- colMeans(X * as.numeric(pred - y))
  grad <- grad + 2 * lambda * as.numeric(coefs^2)
  return(as.numeric(grad))
}

eval_Hess_vec <- function(coefs, vec, X, y, weights=NULL, lambda=1e-5) {
  pred <- 1 / (1 + exp(-as.numeric(X %*% coefs)))
  diag <- pred * (1 - pred)
  Hp   <- (t(X) * diag) %*% (X %*% vec)
  Hp   <- Hp / NROW(X) + 2 * lambda * vec
  return(as.numeric(Hp))
}

pred_fun <- function(X, coefs, ...) {
  return(1 / (1 + exp(-as.numeric(X %*% coefs))))
}


### Initialize optimizer form arbitrary values
x0 <- c(1, 1, 1)
optimizer <- SQN(x0, grad_fun=eval_grad, pred_fun=pred_fun,
  hess_vec_fun=eval_Hess_vec, initial_step=1e-0)
val_data <- generate_data_batch(true_coefs, n=100)

### Fit to 250 batches of data, 100 observations each
set.seed(1)
for (i in 1:250) {
  new_batch <- generate_data_batch(true_coefs, n=100)
  partial_fit(optimizer, new_batch$X, new_batch$y, lambda=1e-5)
  x_curr <- get_curr_x(optimizer)
  i_curr <- get_iteration_number(optimizer)
  if ((i_curr %% 10)  == 0) {
    cat(sprintf("Iteration %3d - E[f(x)]: %f - values of x: [%f, %f, %f]\n",
      i_curr, eval_fun(x_curr, val_data$X, val_data$y, lambda=1e-5),
      x_curr[1], x_curr[2], x_curr[3]))
  }
}

### Predict for new data
new_batch <- generate_data_batch(true_coefs, n=10)
yhat <- predict(optimizer, new_batch$X)

stochQN documentation built on Sept. 26, 2021, 9:07 a.m.

Related to SQN in stochQN...