Nothing
#' Compute an aggregation rule
#'
#' The function \code{mixture} builds an
#' aggregation rule chosen by the user.
#' It can then be used to predict new observations Y sequentially.
#' If observations \code{Y} and expert advice \code{experts} are provided,
#' \code{mixture} is trained by predicting the observations in \code{Y}
#' sequentially with the help of the expert advice in \code{experts}.
#' At each time instance \eqn{t=1,2,\dots,T}, the mixture forms a prediction of \code{Y[t,]} by assigning
#' a weight to each expert and by combining the expert advice.
#'
#'
#' @param Y A matrix with T rows and d columns. Each row \code{Y[t,]} contains a d-dimensional
#' observation to be predicted sequentially.
#'
#' @param experts An array of dimension \code{c(T,d,K)}, where \code{T} is the length of the data-set,
#' \code{d} the dimension of the observations, and \code{K} is the number of experts. It contains the expert
#' forecasts. Each vector \code{experts[t,,k]} corresponds to the d-dimensional prediction of \code{Y[t,]}
#' proposed by expert k at time \eqn{t=1,\dots,T}.
#' In the case of real prediction (i.e., \eqn{d = 1}), \code{experts} is a matrix with \code{T} rows and \code{K} columns.
#'
#' @param model A character string specifying the aggregation rule to use.
#' Currently available aggregation rules are:
#' \describe{
#' \item{'EWA'}{Exponentially weighted average aggregation rules \insertCite{cesa2006prediction}{opera}. A positive learning rate \strong{eta}
#' can be chosen by the user. The
#' bigger it is the faster the aggregation rule will learn from observations
#' and experts performances. However, too high values lead to unstable weight
#' vectors and thus unstable predictions. If it is not specified, the learning rate is calibrated online.
#' A finite grid of potential learning rates to be optimized online can be specified with \strong{grid.eta}.}
#' \item{'FS'}{Fixed-share aggregation rule \insertCite{cesa2006prediction}{opera}. As for \code{ewa}, a learning rate \strong{eta}
#' can be chosen by the user or calibrated online. The main difference with \code{ewa} aggregation
#' rule rely in the mixing rate \strong{alpha}\eqn{\in [0,1]} which considers at
#' each instance a small probability \code{alpha} to have a rupture in the
#' sequence and that the best expert may change. Fixed-share aggregation rule
#' can thus compete with the best sequence of experts that can change a few
#' times (see \code{\link{oracle}}), while \code{ewa} can only
#' compete with the best fixed expert. The mixing rate \strong{alpha} is either chosen by the user either calibrated online.
#' Finite grids of learning rates and mixing rates to be optimized can be specified with
#' parameters \strong{grid.eta} and \strong{grid.alpha}.}
#' \item{'Ridge'}{Online Ridge regression \insertCite{cesa2006prediction}{opera}. It minimizes at
#' each instance a penalized criterion. It forms at each instance linear
#' combination of the experts' forecasts and can assign negative weights that
#' not necessarily sum to one. It is useful if the experts are biased or
#' correlated. It cannot be used with specialized experts. A positive regularization coefficient \strong{lambda}
#' can either be chosen by the user or calibrated online.
#' A finite grid of coefficient to be optimized can be specified with a parameter \strong{grid.lambda}.}
#' \item{'MLpol', 'MLewa', 'MLprod'}{Aggregation rules with multiple learning rates that are
#' theoretically calibrated \insertCite{GaillardStoltzEtAl2014}{opera}. }
#' \item{'BOA'}{Bernstein online Aggregation \insertCite{wintenberger2017optimal}{opera}.
#' The learning rates are automatically calibrated.}
#' \item{'OGD'}{Online Gradient descent \insertCite{zinkevich2003online}{opera}. See also \insertCite{hazan2019introduction}{opera}. The optimization is performed with a time-varying learning rate.
#' At time step \eqn{t \geq 1}, the learning rate is chosen to be \eqn{t^{-\alpha}}, where \eqn{\alpha} is provided by alpha in the parameters argument.
#' The algorithm may or not perform a projection step into the simplex space (non-negative weights that sum to one) according to
#' the value of the parameter 'simplex' provided by the user.}
#' \item{'FTRL'}{Follow The Regularized Leader \insertCite{shalev2007primal}{opera}.
#' Note that here, the linearized version of FTRL is implemented (see Chap. 5 of \insertCite{hazan2019introduction}{opera}).
#' \code{\link{FTRL}} is the online counterpart of empirical risk minimization. It is a family of aggregation rules (including OGD) that uses at any time the empirical risk
#' minimizer so far with an additional regularization. The online optimization can be performed
#' on any bounded convex set that can be expressed with equality or inequality constraints. Note that this method is still under development and a beta version.
#'
#' The user must provide (in the \strong{parameters}'s list):
#' \itemize{
#' \item{'eta' }{The learning rate.}
#' \item{'fun_reg' }{The regularization function to be applied on the weigths. See \code{\link{auglag}}: fn.}
#' \item{'constr_eq' }{The equality constraints (e.g. sum(w) = 1). See \code{\link{auglag}}: heq.}
#' \item{'constr_ineq' }{The inequality constraints (e.g. w > 0). See \code{\link{auglag}}: hin.}
#' \item{'fun_reg_grad' }{(optional) The gradient of the regularization function. See \code{\link{auglag}}: gr.}
#' \item{'constr_eq_jac' }{(optional) The Jacobian of the equality constraints. See \code{\link{auglag}}: heq.jac}
#' \item{'constr_ineq_jac' }{(optional) The Jacobian of the inequality constraints. See \code{\link{auglag}}: hin.jac}
#' } or set \strong{default} to TRUE. In the latter, \link{FTRL} is performed with Kullback regularization (\code{fun_reg(x) = sum(x log (x/w0))}
#' on the simplex (\code{constr_eq(w) = sum(w) - 1} and \code{constr_ineq(w) = w}).
#' Parameters \strong{w0} (weight initialization), and \strong{max_iter} can also be provided.
#' }
#' }
#'
#' @param loss.type \code{character, list, or function} ("square").
#' \describe{
#' \item{character}{ Name of the loss to be applied ('square', 'absolute', 'percentage', or 'pinball');}
#' \item{list}{ List with field \code{name} equal to the loss name. If using pinball loss, field \code{tau} equal to the required quantile in [0,1];}
#' \item{function}{ A custom loss as a function of two parameters (prediction, observation).
#' For example, $f(x,y) = abs(x-y)/y$ for the Mean absolute percentage error or $f(x,y) = (x-y)^2$ for the squared loss.}
#' }
#'
#' @param loss.gradient \code{boolean, function} (TRUE).
#' \describe{
#' \item{boolean}{ If TRUE, the aggregation rule will not be directly applied to the loss function at hand,
#' but to a gradient version of it. The aggregation rule is then similar to gradient descent aggregation rule. }
#' \item{function}{Can be provided if loss.type is a function. It should then be
#' a sub-derivative of the loss in its first component (i.e., in the prediction).
#' For instance, $g(x) = (x-y)$ for the squared loss.
#' }
#' }
#'
#' @param coefficients A probability vector of length K containing the prior weights of the experts
#' (not possible for 'MLpol'). The weights must be non-negative and sum to 1.
#'
#' @param awake A matrix specifying the
#' activation coefficients of the experts. Its entries lie in \code{[0,1]}.
#' Possible if some experts are specialists and do not always form and suggest
#' prediction. If the expert number \code{k} at instance \code{t} does not
#' form any prediction of observation \code{Y_t}, we can put
#' \code{awake[t,k]=0} so that the mixture does not consider expert \code{k} in
#' the mixture to predict \code{Y_t}.
#'
#' @param parameters A list that contains optional parameters for the aggregation rule.
#' If no parameters are provided, the aggregation rule is fully calibrated
#' online. Possible parameters are:
#' \describe{
#' \item{eta}{A positive number defining the learning rate.
#' Possible if model is either 'EWA' or 'FS'}
#' \item{grid.eta}{A vector of positive numbers defining potential learning rates
#' for 'EWA' of 'FS'.
#' The learning rate is then calibrated by sequentially optimizing the parameter in
#' the grid. The grid may be extended online if needed by the aggregation rule.}
#' \item{gamma}{A positive number defining the exponential step of extension
#' of grid.eta when it is needed. The default value is 2.}
#' \item{alpha}{A number in [0,1]. If the model is 'FS', it defines the mixing rate.
#' If the model is 'OGD', it defines the order of the learning rate: \eqn{\eta_t = t^{-\alpha}}.}
#' \item{grid.alpha}{A vector of numbers in [0,1] defining potential mixing rates for 'FS'
#' to be optimized online. The grid is fixed over time. The default value is \code{[0.0001,0.001,0.01,0.1]}.}
#' \item{lambda}{A positive number defining the smoothing parameter of 'Ridge' aggregation rule.}
#' \item{grid.lambda}{Similar to \code{grid.eta} for the parameter \code{lambda}.}
#' \item{simplex}{A boolean that specifies if 'OGD' does a project on the simplex. In other words,
#' if TRUE (default) the online gradient descent will be under the constraint that the weights sum to 1
#' and are non-negative. If FALSE, 'OGD' performs an online gradient descent on K dimensional real space.
#' without any projection step.}
#' \item{averaged}{A boolean (default is FALSE). If TRUE the coefficients and the weights
#' returned (and used to form the predictions) are averaged over the past. It leads to more stability on the time evolution of the weights but needs
#' more regularity assumption on the underlying process generating the data (i.i.d. for instance). }
#' }
#'
#' @param use_cpp \code{boolean}. Whether or not to use cpp optimization to fasten the computations. This option is not yet compatible
#' with the use of custom loss function. Note that cpp implementation corresponds to an earlier version of the code and may be outdated.
#' Use \code{options(opera_use_cpp = TRUE)} to change the default value.
#'
#' @param quiet \code{boolean}. Whether or not to display progress bars.
#'
#' @return An object of class mixture that can be used to perform new predictions.
#' It contains the parameters \code{model}, \code{loss.type}, \code{loss.gradient},
#' \code{experts}, \code{Y}, \code{awake}, and the fields
#' \item{coefficients}{A vector of coefficients
#' assigned to each expert to perform the next prediction.}
#'
#' \item{weights }{ A matrix of dimension \code{c(T,K)}, with
#' \code{T} the number of instances to be predicted and \code{K} the number of
#' experts. Each row contains the convex combination to form the predictions }
#' \item{prediction }{ A matrix with \code{T} rows and \code{d} columns that contains the
#' predictions outputted by the aggregation rule. }
#' \item{loss}{ The average loss (as stated by parameter \code{loss.type}) suffered
#' by the aggregation rule.}
#' \item{parameters}{The learning parameters chosen by the aggregation rule or by the user.}
#' \item{training}{A list that contains useful temporary information of the
#' aggregation rule to be updated and to perform predictions.}
#' @author Pierre Gaillard <pierre@@gaillard.me> Yannig Goude <yannig.goude@@edf.fr>
#' @keywords ~models ~ts
#' @seealso See \code{\link{opera-package}} and opera-vignette for a brief example about how to use the package.
#'
#' @importFrom stats predict
#' @export mixture
#'
#' @references
#' \insertAllCited{}
#'
#'
#' @rdname mixture-opera
#'
#' @example examples/example.R
#'
mixture <- function(Y = NULL, experts = NULL, model = "MLpol", loss.type = "square",
loss.gradient = TRUE, coefficients = "Uniform", awake = NULL, parameters = list(),
use_cpp = getOption("opera_use_cpp", default = FALSE), quiet = TRUE) UseMethod("mixture")
#' @export
mixture.default <- function(Y = NULL, experts = NULL, model = "MLpol", loss.type = "square",
loss.gradient = TRUE, coefficients = "Uniform", awake = NULL, parameters = list(),
use_cpp = getOption("opera_use_cpp", default = FALSE), quiet = TRUE) {
# checks
experts <- check_matrix(experts, "experts")
awake <- check_matrix(awake, "awake")
loss.type <- check_loss(loss.type = loss.type, loss.gradient = loss.gradient, use_cpp = use_cpp)
object <- list(model = model, loss.type = loss.type, loss.gradient = loss.gradient,
coefficients = coefficients, parameters = parameters, Y = NULL, experts = NULL,
awake = NULL, training = NULL, names.experts = colnames(experts), T = 0, d = "unknown")
class(object) <- "mixture"
# Test that Y and experts have correct dimensions
if ((is.null(Y) && !is.null(experts)) || (!is.null(Y) && is.null(experts))) {
stop("Bad dimensions: length(Y) should be equal to nrow(experts)")
}
if (!is.null(Y)) {
# Test the dimension of Y: if Y is a matrix, the number of columns is the space of prediction
if (is.null(dim(Y))) {
d = 1
T = length(Y)
} else {
d = ncol(Y)
T = nrow(Y)
if (d > 1 && T > 1 && length(dim(experts)) < 3) {
stop("Bad dimensions: nrow(experts) should be equal to dim(experts)[3]")
}
if (length(dim(experts)) == 3) {
if ((dim(experts)[1] != T) || (dim(experts)[2] != d)){
stop("Bad dimensions between Y and experts")
}
}
if (T == 1 && d>1) {
if (length(dim(experts)) == 2) {
if (dim(experts)[1] != d) {
stop("Bad dimensions between Y and experts")
}
}
}
}
if (T == 1 && d == 1) {
experts <- as.matrix(experts)
if (! (nrow(experts) == 1 || ncol(experts) == 1)) {
stop("Bad dimensions: length(Y) should be equal to nrow(experts)")
}
}
if (dim(experts)[1] != T) {
stop("Bad dimensions: length(Y) should be equal to nrow(experts)")
}
object$d <- d
object <- predict(object, newY = Y, newexperts = experts, awake = awake,
type = "model", use_cpp = use_cpp, quiet = quiet)
}
return(object)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.