Compute an aggregation rule
Description
The function mixture
builds an
aggregation rule chosen by the user.
It can then be used to predict new observations Y sequentially.
If observations Y
and expert advice experts
are provided,
mixture
is trained by predicting the observations in Y
sequentially with the help of the expert advice in experts
.
At each time instance t=1,2,…,T, the mixture forms a prediction of Y[t,]
by assigning
a weight to each expert and by combining the expert advice.
Print an aggregation procedure
Summary of an aggregation procedure
Usage
1 2 3 4 5 6 7 8 9 
Arguments
Y 
A matrix with T rows and d columns. Each row 
experts 
An array of dimension 
model 
A character string specifying the aggregation rule to use. Currently available aggregation rules are:

loss.type 
A string or a list with a component 'name' specifying the loss function considered to evaluate the performance. It can be 'square', 'absolute', 'percentage', or 'pinball'. In the case of the pinball loss, the quantile can be provided by assigning to loss.type a list of two elements:
'Ridge' aggregation rule is restricted to square loss. 
loss.gradient 
A boolean. If TRUE (default) the aggregation rule will not be directly applied to the loss function at hand but to a gradient version of it. The aggregation rule is then similar to gradient descent aggregation rule. 
coefficients 
A probability vector of length K containing the prior weights of the experts (not possible for 'MLpol'). The weights must be nonnegative and sum to 1. 
awake 
A matrix specifying the
activation coefficients of the experts. Its entries lie in 
parameters 
A list that contains optional parameters for the aggregation rule. If no parameters are provided, the aggregation rule is fully calibrated online. Possible parameters are:

x 
An object of class mixture 
... 
Additional parameters 
object 
An object of class mixture 
Value
An object of class mixture that can be used to perform new predictions.
It contains the parameters model
, loss.type
, loss.gradient
,
experts
, Y
, awake
, and the fields
coefficients 
A vector of coefficients assigned to each expert to perform the next prediction. 
weights 
A matrix of dimension 
prediction 
A matrix with 
loss 
The average loss (as stated by parameter 
parameters 
The learning parameters chosen by the aggregation rule or by the user. 
training 
A list that contains useful temporary information of the aggregation rule to be updated and to perform predictions. 
Methods (by class)

mixture
:print

mixture
:summary
Author(s)
Pierre Gaillard <pierre@gaillard.me>
See Also
See operapackage
and operavignette for a brief example about how to use the package.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82  #'
library('opera') # load the package
set.seed(1)
# Example: find the best one week ahead forecasting strategy (weekly data)
# packages
library(mgcv)
# import data
data(electric_load)
idx_data_test < 620:nrow(electric_load)
data_train < electric_load[idx_data_test, ]
data_test < electric_load[idx_data_test, ]
# Build the expert forecasts
# ##########################
# 1) A generalized additive model
gam.fit < gam(Load ~ s(IPI) + s(Temp) + s(Time, k=3) +
s(Load1) + as.factor(NumWeek), data = data_train)
gam.forecast < predict(gam.fit, newdata = data_test)
# 2) An online autoregressive model on the residuals of a medium term model
# Medium term model to remove trend and seasonality (using generalized additive model)
detrend.fit < gam(Load ~ s(Time,k=3) + s(NumWeek) + s(Temp) + s(IPI), data = data_train)
electric_load$Trend < c(predict(detrend.fit), predict(detrend.fit,newdata = data_test))
electric_load$Load.detrend < electric_load$Load  electric_load$Trend
# Residual analysis
ar.forecast < numeric(length(idx_data_test))
for (i in seq(idx_data_test)) {
ar.fit < ar(electric_load$Load.detrend[1:(idx_data_test[i]  1)])
ar.forecast[i] < as.numeric(predict(ar.fit)$pred) + electric_load$Trend[idx_data_test[i]]
}
# Aggregation of experts
###########################
X < cbind(gam.forecast, ar.forecast)
colnames(X) < c('gam', 'ar')
Y < data_test$Load
matplot(cbind(Y, X), type = 'l', col = 1:6, ylab = 'Weekly load', xlab = 'Week')
# How good are the expert? Look at the oracles
oracle.convex < oracle(Y = Y, experts = X, loss.type = 'square', model = 'convex')
plot(oracle.convex)
oracle.convex
# Is a single expert the best over time ? Are there breaks ?
oracle.shift < oracle(Y = Y, experts = X, loss.type = 'percentage', model = 'shifting')
plot(oracle.shift)
oracle.shift
# Online aggregation of the experts with BOA
#############################################
# Initialize the aggregation rule
m0.BOA < mixture(model = 'BOA', loss.type = 'square')
# Perform online prediction using BOA There are 3 equivalent possibilities 1)
# start with an empty model and update the model sequentially
m1.BOA < m0.BOA
for (i in 1:length(Y)) {
m1.BOA < predict(m1.BOA, newexperts = X[i, ], newY = Y[i])
}
# 2) perform online prediction directly from the empty model
m2.BOA < predict(m0.BOA, newexpert = X, newY = Y, online = TRUE)
# 3) perform the online aggregation directly
m3.BOA < mixture(Y = Y, experts = X, model = 'BOA', loss.type = 'square')
# These predictions are equivalent:
identical(m1.BOA, m2.BOA) # TRUE
identical(m1.BOA, m3.BOA) # TRUE
# Display the results
summary(m3.BOA)
plot(m1.BOA)
