psqn  R Documentation 
Optimization method for specially structured partially separable
functions. The psqn_aug_Lagrang
function supports nonlinear
equality constraints using an augmented Lagrangian method.
psqn( par, fn, n_ele_func, rel_eps = 1e08, max_it = 100L, n_threads = 1L, c1 = 1e04, c2 = 0.9, use_bfgs = TRUE, trace = 0L, cg_tol = 0.5, strong_wolfe = TRUE, env = NULL, max_cg = 0L, pre_method = 1L, mask = as.integer(c()), gr_tol = 1 ) psqn_aug_Lagrang( par, fn, n_ele_func, consts, n_constraints, multipliers = as.numeric(c()), penalty_start = 1L, rel_eps = 1e08, max_it = 100L, max_it_outer = 100L, violations_norm_thresh = 1e06, n_threads = 1L, c1 = 1e04, c2 = 0.9, tau = 1.5, use_bfgs = TRUE, trace = 0L, cg_tol = 0.5, strong_wolfe = TRUE, env = NULL, max_cg = 0L, pre_method = 1L, mask = as.integer(c()), gr_tol = 1 )
par 
Initial values for the parameters. It is a concatenated vector of the global parameters and all the private parameters. 
fn 
Function to compute the element functions and their derivatives. Each call computes an element function. See the examples section. 
n_ele_func 
Number of element functions. 
rel_eps 
Relative convergence threshold. 
max_it 
Maximum number of iterations. 
n_threads 
Number of threads to use. 
c1, c2 
Thresholds for the Wolfe condition. 
use_bfgs 
Logical for whether to use BFGS updates or SR1 updates. 
trace 
Integer where larger values gives more information during the optimization. 
cg_tol 
Threshold for the conjugate gradient method. 
strong_wolfe 

env 
Environment to evaluate 
max_cg 
Maximum number of conjugate gradient iterations in each iteration. Use zero if there should not be a limit. 
pre_method 
Preconditioning method in the conjugate gradient method. Zero yields no preconditioning, one yields diagonal preconditioning, two yields the incomplete Cholesky factorization from Eigen, and three yields a block diagonal preconditioning. One and three are fast options with three seeming to work well for some poorly conditioned problems. 
mask 
zero based indices for parameters to mask (i.e. fix). 
gr_tol 
convergence tolerance for the Euclidean norm of the gradient. A negative value yields no check. 
consts 
Function to compute the constraints which must be equal to zero. See the example Section. 
n_constraints 
The number of constraints. 
multipliers 
Staring values for the multipliers in the augmented
Lagrangian method. There needs to be the same number of multipliers as the
number of constraints. An empty vector, 
penalty_start 
Starting value for the penalty parameterin the augmented Lagrangian method. 
max_it_outer 
Maximum number of augmented Lagrangian steps. 
violations_norm_thresh 
Threshold for the norm of the constraint violations. 
tau 
Multiplier used for the penalty parameter between each outer iterations. 
The function follows the method described by Nocedal and Wright (2006)
and mainly what is described in Section 7.4. Details are provided
in the psqn vignette. See vignette("psqn", package = "psqn")
.
The partially separable function we consider are special in that the
function to be minimized is a sum of socalled element functions which
only depend on few shared (global) parameters and some
private parameters which are particular to each element function. A generic
method for other partially separable functions is available through the
psqn_generic
function.
The optimization function is also available in C++ as a headeronly library. Using C++ may reduce the computation time substantially. See the vignette in the package for examples.
You have to define the PSQN_USE_EIGEN
macro variable in C++ if you want
to use the incomplete Cholesky factorization from Eigen. You will also have
to include Eigen or RcppEigen. This is not needed when you use the R
functions documented here. The incomplete Cholesky factorization comes
with some additional overhead because of the allocations of the
factorization,
forming the factorization, and the assignment of the sparse version of
the Hessian approximation.
However, it may substantially reduce the required number of conjugate
gradient iterations.
pqne
: An object with the following elements:
par 
the estimated global and private parameters. 
value 
function value at 
info 
information code. 0 implies convergence. 1 implies that the maximum number iterations is reached. 2 implies that the conjugate gradient method failed. 3 implies that the line search failed. 4 implies that the user interrupted the optimization. 
counts 
An integer vector with the number of function evaluations, gradient evaluations, and the number of conjugate gradient iterations. 
convergence 

psqn_aug_Lagrang
: Like psqn
with a few exceptions:
multipliers 
final multipliers from the augmented Lagrangian method. 
counts 
has an additional element called 
penalty 
the final penalty parameter from the augmented Lagrangian method. 
Nocedal, J. and Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer.
Lin, C. and MorĂ©, J. J. (1999). Incomplete Cholesky factorizations with limited memory. SIAM Journal on Scientific Computing.
# example with inner problem in a Taylor approximation for a GLMM as in the # vignette # assign model parameters, number of random effects, and fixed effects q < 2 # number of private parameters per cluster p < 1 # number of global parameters beta < sqrt((1:p) / sum(1:p)) Sigma < diag(q) # simulate a data set set.seed(66608927) n_clusters < 20L # number of clusters sim_dat < replicate(n_clusters, { n_members < sample.int(8L, 1L) + 2L X < matrix(runif(p * n_members, sqrt(6 / 2), sqrt(6 / 2)), p) u < drop(rnorm(q) %*% chol(Sigma)) Z < matrix(runif(q * n_members, sqrt(6 / 2 / q), sqrt(6 / 2 / q)), q) eta < drop(beta %*% X + u %*% Z) y < as.numeric((1 + exp(eta))^(1) > runif(n_members)) list(X = X, Z = Z, y = y, u = u, Sigma_inv = solve(Sigma)) }, simplify = FALSE) # evaluates the negative log integrand. # # Args: # i cluster/element function index. # par the global and private parameter for this cluster. It has length # zero if the number of parameters is requested. That is, a 2D integer # vector the number of global parameters as the first element and the # number of private parameters as the second element. # comp_grad logical for whether to compute the gradient. r_func < function(i, par, comp_grad){ dat < sim_dat[[i]] X < dat$X Z < dat$Z if(length(par) < 1) # requested the dimension of the parameter return(c(global_dim = NROW(dat$X), private_dim = NROW(dat$Z))) y < dat$y Sigma_inv < dat$Sigma_inv beta < par[1:p] uhat < par[1:q + p] eta < drop(beta %*% X + uhat %*% Z) exp_eta < exp(eta) out < sum(y * eta) + sum(log(1 + exp_eta)) + sum(uhat * (Sigma_inv %*% uhat)) / 2 if(comp_grad){ d_eta < y + exp_eta / (1 + exp_eta) grad < c(X %*% d_eta, Z %*% d_eta + dat$Sigma_inv %*% uhat) attr(out, "grad") < grad } out } # optimize the log integrand res < psqn(par = rep(0, p + q * n_clusters), fn = r_func, n_ele_func = n_clusters) head(res$par, p) # the estimated global parameters tail(res$par, n_clusters * q) # the estimated private parameters # compare with beta c(sapply(sim_dat, "[[", "u")) # add equality constraints idx_constrained < list(c(2L, 19L), c(1L, 5L, 8L)) # evaluates the c(x) in equalities c(x) = 0. # # Args: # i constrain index. # par the constrained parameters. It has length zero if we need to pass the # onebased indices of the parameters that the i'th constrain depends on. # what integer which is zero if the function should be returned and one if the # gradient should be computed. consts < function(i, par, what){ if(length(par) == 0) # need to return the indices return(idx_constrained[[i]]) if(i == 1){ # a linear equality constrain. It is implemented as a nonlinear constrain # though out < sum(par)  3 if(what == 1) attr(out, "grad") < rep(1, length(par)) } else if(i == 2){ # the parameters need to be on a circle out < sum(par^2)  1 if(what == 1) attr(out, "grad") < 2 * par } out } # optimize with the constraints res_consts < psqn_aug_Lagrang( par = rep(0, p + q * n_clusters), fn = r_func, consts = consts, n_ele_func = n_clusters, n_constraints = length(idx_constrained)) res_consts res_consts$multipliers # the estimated multipliers res_consts$penalty # the penalty parameter # the function value is higher (worse) as expected res$value  res_consts$value # the two constraints are satisfied sum(res_consts$par[idx_constrained[[1]]])  3 # ~ 0 sum(res_consts$par[idx_constrained[[2]]]^2)  1 # ~ 0 # we can also use another pre conditioner res_consts_chol < psqn_aug_Lagrang( par = rep(0, p + q * n_clusters), fn = r_func, consts = consts, n_ele_func = n_clusters, n_constraints = length(idx_constrained), pre_method = 2L) res_consts_chol
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.