# mixSQP: Maximum-likelihood estimation of mixture proportions using... In stephenslab/mixsqp: Sequential Quadratic Programming for Fast Maximum-Likelihood Estimation of Mixture Proportions

## Description

The `mixsqp` function uses a Sequential Quadratic Programming (SQP) algorithm to find the maximum likelihood estimates of mixture proportions in a (finite) mixture model. More generally, `mixsqp` solves the corresponding constrained, convex optimization problem, which is given below (see ‘Details’). See ‘References’ for more details about the SQP algorithm.

## Usage

 ```1 2 3 4``` ```mixsqp(L, w = rep(1, nrow(L)), x0 = rep(1, ncol(L)), control = list()) mixsqp_control_default() ```

## Arguments

 `L` Matrix specifying the optimization problem to be solved. In the context of mixture-model fitting, `L[j,k]` should be the value of the kth mixture component density at the jth data point. `L` should be a numeric matrix with at least two columns, with all entries being non-negative and finite (and not missing). For large matrices, it is preferrable that the matrix is stored in double-precision; see `storage.mode`. `w` An optional numeric vector, with one entry for each row of `L`, specifying the "weights" associated with the rows of `L`. All weights must be finite, non-negative and not missing. Internally, the weights are normalized to sum to 1, which does not change the problem, but does change the value of the objective function reported. By default, all weights are equal. `x0` An optional numeric vector providing an initial estimate of the solution to the optimization problem. It should contain only finite, non-missing, non-negative values, and all entries of `L %*% x0` must be greater than zero (to ensure that the objective evaluates to a finite value at `x0`). The vector will be normalized to sum to 1. By default, `x0` is the vector with all equal values. `control` A list of parameters controlling the behaviour of the optimization algorithm. See ‘Details’.

## Details

`mixsqp` solves the following optimization problem. Let L be a matrix with n rows and m columns containing only non-negative entries, and let w = (w_1, …, w_n) be a vector of non-negative "weights". `mixsqp` computes the value of vector x = (x_1, …, x_m) minimizing the following objective function,

f(x) = -∑_{j=1}^n w_j \log (∑_{k=1}^m L_{jk} x_k),

subject to the constraint that x lie within the simplex; that is, the entries of x are non-negative and sum to 1. Implicitly, there is an additional constraint L*x > 0 in order to ensure that the objective has a finite value. In practice, this constraint only needs to be checked for the initial estimate to ensure that it holds for all subsequent iterates.

If all weights are equal, solving this optimization problem corresponds to finding the maximum-likelihood estimate of the mixture proportions x given n independent data points drawn from a mixture model with m components. In this case, L_{jk} is the likelihood for mixture component k and data point j.

The Expectation Maximization (EM) algorithm can be used to solve this optimization problem, but it is intolerably slow in many interesting cases, and mixsqp is much faster.

A special feature of this optimization problem is that the gradient of the objective does not change with re-scaling; for example, if all the entries of matrix `L` are multiplied by 100, the gradient does not change. A practical benefit of this property is that the optimization algorithm will behave similarly irrespective of the scale of `L`; for example, the same value for the convergence tolerance `convtol.sqp` will have the same effect at different scales. The one exception is the `eps` control parameter, and for this reason the effect of `eps` is automatically adjusted to reflect the scale of `L`; see the description of this parameter below for more details.

A related feature is that the solution to the optimization problem is invariant to rescaling the rows of `L`; for example, the solution will remain the same after all the entries in a row of `L` are multiplied by 10. A simple normalization scheme divides each row by the largest entry in the row so that all entries of `L` are at most 1: `L <- L / apply(L,1,max)` Occasionally, it can be helpful to normalize the rows when some of the entries are unusually large or unusually small. This can help to avoid numerical overflow or underflow errors.

The SQP algorithm is implemented using the Armadillo C++ linear algebra library, which can automatically take advantage of multithreaded matrix computations to speed up `mixsqp` for large `L` matrices, but only when R has been configured with a multithreaded BLAS/LAPACK library (e.g., OpenBLAS).

The `control` argument is a list in which any of the following named components will override the default optimization algorithm settings (as they are defined by `mixsqp_control_defaults`):

`convtol.sqp`

A small, non-negative number specifying the convergence tolerance for SQP algorithm; convergence is reached when the maximum dual residual in the Karush-Kuhn-Tucker (KKT) optimality conditions is less than or equal to `convtol.sqp`. Smaller values will result in more stringent convergence criteria and more accurate solutions, at the expense of greater computation time. Note that changes to this tolerance parameter may require respective changes to `convtol.activeset` and/or `zero.threshold.searchdir` to obtain reliable convergence.

`convtol.activeset`

A small, non-negative number specifying the convergence tolerance for the active-set step. Smaller values will result in higher quality search directions for the SQP algorithm but possibly a greater per-iteration computational cost. Note that changes to this tolerance parameter can affect how reliably the SQP convergence criterion is satisfied, as determined by `convtol.sqp`.

`zero.threshold.solution`

A small, non-negative number used to determine the "active set"; that is, it determines which entries of the solution are exactly zero. Any entries that are less than or equal to `zero.threshold.solution` are considered to be exactly zero. Larger values of `zero.threshold.solution` may lead to speedups for matrices with many columns, at the (slight) risk of prematurely zeroing some co-ordinates.

`zero.threshold.searchdir`

A small, non-negative number used to determine when the search direction in the active-set step is considered "small enough". Note that changes to this tolerance parameter can affect how reliably the SQP convergence criterion is satisfied, as determined by `convtol.sqp`, so choose this parameter carefully.

`suffdecr.linesearch`

This parameter determines how stringent t he "sufficient decrease" condition is for accepting a step size in the backtracking line search. Larger values will make the condition more stringent. This should be a positive number less than 1.

`stepsizereduce`

The multiplicative factor for decreasing the step size in the backtracking line search. Smaller values will yield a faster backtracking line search at the expense of a less fine-grained search. Should be a positive number less than 1.

`minstepsize`

The smallest step size accepted by the line search step. Should be a number greater than 0 and at most 1.

`eps`

A small, non-negative number that is added to the terms inside the logarithms to sidestep computing logarithms of zero. This prevents numerical problems at the cost of introducing a small inaccuracy in the solution. Increasing this number may lead to faster convergence but possibly a less accurate solution. Since an appropriate value of this number will depend on the "scale" of `L`, `eps` is automatically scaled separately for each row of `L`; specifically, the ith modified logarithm term becomes `log(L[i,]*x + ei)`, in which `ei` is set to `eps * max(L[i,])`.

`delta`

A small, non-negative number added to the diagonal of the Hessian to improve numerical stability (and possibly the speed) when computing the search directions in the active-set step.

`maxiter.sqp`

Maximum number of SQP iterations to run before reporting a convergence failure; that is, the maximum number of quadratic subproblems that will be solved by the active-set method.

`maxiter.activeset`

Maximum number of active-set iterations taken to solve each of the quadratic subproblems.

`verbose`

If `verbose = TRUE`, the algorithm's progress and a summary of the optimization settings are printed to the console. The algorithm's progress is displayed in a table with one row per SQP (outer loop) iteration, and with the following columns: "iter", the (outer loop) SQP iteration; "objective", the value of the objective function (see f(x)) at the current estimate of the solution, x; "max(rdual)", the maximum "dual residual" in the Karush-Kuhn-Tucker (KKT) conditions, which is used to monitor convergence (see `convtol.sqp`); "nnz", the number of non-zero co-ordinates in the current estimate, as determined by `zero.threshold.solution`; "max.diff", the maximum difference in the estimates between two successive iterations; "nqp", the number of (inner loop) active-set iterations taken to solve the quadratic subproblem; "nls", the number of iterations in the backtracking line search.

## Value

A list object with the following elements:

 `x` The solution to the convex optimization problem. `value` The value of the objective function, f(x), at `x`. `status` A character string describing the status of the algorithm upon termination. `progress` A data frame containing more detailed information about the algorithm's progress. The data frame has one row per SQP iteration. For an explanation of the columns, see the description of the `verbose` control paramter in ‘Details’. Missing values (`NA`'s) in the last row indicate that these quantities were not computed because convergence was reached before computing them. Also note that the storage of these quantities in the `progress` data frame is slightly different than in the console output (when `verbose = TRUE`) as the console output shows some quantities that were computed after the convergence check in the previous iteration.

## References

Y. Kim, P. Carbonetto, M. Stephens and M. Anitescu (2018). A fast algorithm for maximum likelihood estimation of mixture proportions using sequential quadratic programming. arXiv:1806.01412 https://arxiv.org/abs/1806.01412.

`mixobjective`, `mixkwdual`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```set.seed(1) n <- 1e5 m <- 10 w <- rep(1,n)/n L <- simulatemixdata(n,m)\$L out.mixsqp <- mixsqp(L,w) f <- mixobjective(L,out.mixsqp\$x,w) print(f,digits = 16) # We can also compare this result with solution found from an # interior-point approach called via the "KWDual" function from the # REBayes package. (This requires installation of the MOSEK # optimization library as well as the REBayes package, so we have # made this step optional.) ## Not run: out.kwdual <- mixkwdual(L,w) print(mixobjective(L,out.kwdual\$x,w),digits = 16) ## End(Not run) ```