Gradient boosting for optimizing arbitrary loss functions, where componentwise models
are utilized as baselearners in the case of functional responses.
Scalar responses are treated as the special case where each functional response has
only one observation.
This function is a wrapper for mboost
's mboost
and its
siblings to fit models of the general form
ξ(Y_i(t)  X_i = x_i) = ∑_{j} h_j(x_i, t), i = 1, ..., N,
with a functional (but not necessarily continuous) response Y(t), transformation function ξ, e.g., the expectation, the median or some quantile, and partial effects h_j(x_i, t) depending on covariates x_i and the current index of the response t. The index of the response can be for example time. Possible effects are, e.g., a smooth intercept β_0(t), a linear functional effect \int x_i(s)β(s,t)ds, potentially with integration limits depending on t, smooth and linear effects of scalar covariates f(z_i) or z_i β(t).
1 2 3 
formula 
a symbolic description of the model to be fit.
Per default no intercept is added, only a smooth offset, see argument 
timeformula 
onesided formula for the specification of the effect over the index of the response.
For functional response Y_i(t) typically use 
id 
defaults to NULL which means that all response trajectories are observed
on a common grid allowing to represent the response as a matrix.
If the response is given in long format for observationspecific grids, 
numInt 
integration scheme for the integration of the loss function.
One of 
data 
a data frame or list containing the variables in the model. 
weights 
(1) a numeric vector of weights for observational units,
i.e. 
offset 
a numeric vector to be used as offset over the index of the response (optional).
If no offset is specified, per default 
offset_control 
parameters for the estimation of the offset,
defaults to 
check0 
logical, for response in matrix form, i.e. response that is observed on a common grid,
check the fitted effects for the sumtozero constraint
h_j(x_i)(t) = 0 for all t and give a warning if it is not fulfilled. Defaults to 
... 
additional arguments passed to 
In matrix representation of functional response and covariates each row
represents one functional observation, e.g., Y[i,t_g]
corresponds to Y_i(t_g),
giving a <number of curves> by <number of evaluations> matrix.
For the model fit, the matrix of the functional
response evaluations Y_i(t_g) are stacked internally into one long vector.
If it is possible to represent the model as a generalized linear array model
(Currie et al., 2006), the array structure is used for an efficient implementation,
see mboost
. This is only possible if the design
matrix can be written as the Kronecker product of two marginal design
matrices yielding a functional linear array model (FLAM),
see Brockhaus et al. (2015) for details.
The Kronecker product of two marginal bases is implemented in Rpackage mboost
in the function %O%
, see %O%
.
When %O%
is called with a specification of df
in both baselearners,
e.g., bbs(x1, df = df1) %O% bbs(t, df = df2)
, the global df
for the
Kroneckered baselearner is computed as df = df1 * df2
.
And thus the penalty has only one smoothness parameter lambda resulting in an isotropic penalty.
A Kronecker product with anisotropic penalty is %A%
, allowing for different
amount of smoothness in the two directions, see %A%
.
If the formula contains baselearners connected by %O%
, %A%
or %A0%
,
those effects are not expanded with timeformula
, allowing for model specifications
with different effects in timedirection.
If the response is observed on curvespecific grids it must be supplied
as a vector in long format and the argument id
has
to be specified (as formula!) to define which observations belong to which curve.
In this case the baselearners are built as row tensorproducts of marginal baselearners,
see Scheipl et al. (2015) and Brockhaus et al. (2016), for details on how to set up the effects.
The row tensor product of two marginal bases is implemented in Rpackage mboost
in the function %X%
, see %X%
.
A scalar response can be seen as special case of a functional response with only
one timepoint, and thus it can be represented as FLAM with basis 1 in
timedirection, use timeformula = ~bols(1)
. In this case, a penalty in the
timedirection is used, see Brockhaus et al. (2015) for details.
Alternatively, the scalar response is fitted as scalar response, like in the function
mboost
in package mboost.
The advantage of using FDboost
in that case
is that methods for the functional baselearners are available, e.g., plot
.
The desired regression type is specified by the family
argument,
see the helppage of mboost
. For example a mean regression model is obtained by
family = Gaussian()
which is the default or median regression
by family = QuantReg()
;
see Family
for a list of implemented families.
With FDboost
the following covariate effects can be estimated by specifying
the following effects in the formula
(similar to function pffr
in Rpackage refund
).
The timeformula
is used to expand the effects in t
direction.
Linear functional effect of scalar (numeric or factor) covariate z that varies
smoothly over t, i.e. z_i β(t), specified as
bolsc(z)
, see bolsc
,
or for a group effect with mean zero use brandomc(z)
.
Nonlinear effects of a scalar covariate that vary smoothly over t,
i.e. f(z_i, t), specified as bbsc(z)
,
see bbsc
.
(Nonlinear) effects of scalar covariates that are constant
over t, e.g., f(z_i), specified as c(bbs(z))
,
or β z_i, specified as c(bols(z))
.
Interaction terms between two scalar covariates, e.g., z_i1 zi2 β(t),
are specified as bols(z1) %Xc% bols(z2)
and
an interaction z_i1 f(zi2, t) as bols(z1) %Xc% bbs(z2)
, as
%Xc%
applies the sumtozero constraint to the desgin matrix of the tensor product
built by %Xc%
, see %Xc%
, EXPERIMENTAL!
Functiononfunction regression terms of functional covariates x
,
e.g., \int x_i(s)β(s,t)ds, specified as bsignal(x, s = s)
,
using Psplines, see bsignal
.
Terms given by bfpc
provide FPCbased effects of functional
covariates, see bfpc
.
Functiononfunction regression terms of functional covariates x
with integration limits [l(t), u(t)] depending on t,
e.g., \int_[l(t), u(t)] x_i(s)β(s,t)ds, specified as
bhist(x, s = s, time = t, limits)
. The limits
argument defaults to
"s<=t"
which yields a historical effect with limits [min(t),t],
see bhist
.
Concurrent effects of functional covariates x
measured on the same grid as the response, i.e., x_i(s)β(t),
are specified as bconcurrent(x, s = s, time = t)
,
see bconcurrent
.
Interaction effects can be estimated as tensor product smooth, e.g.,
z \int x_i(s)β(s,t)ds as bsignal(x, s = s) %X% bolsc(z)
For interaction effects with historical functional effects, e.g.,
z_i \int_[l(t),u(t)] x_i(s)β(s,t)ds the baselearner
bhistx
should be used instead of bhist
,
e.g., bhistx(x, limits) %X% bolsc(z)
, see bhistx
.
Generally, the c()
notation can be used to get effects that are
constant over the index of the functional response.
If the formula
in FDboost
contains baselearners connected by
%O%
, %A%
or %A0%
, those effects are not expanded with timeformula
,
allowing for model specifications with different effects in timedirection.
In order to obtain a fair selection of baselearners, the same degrees of freedom (df)
should be specified for all baselearners. If the number of df differs among the baselearners,
the selection is biased towards more flexible baselearners with higher df as they are more
likely to yield larger improvements of the fit. It is recommended to use
a rather small number of df for all baselearners.
It is not possible to specify df larger than the rank of the design matrix.
For baselearners with rankdeficient penalty, it is not possible to specify df smaller than the
rank of the null space of the penalty (e.g., in bbs
unpenalized part of Psplines).
The df of the baselearners in an FDboostobject can be checked using extract(object, "df")
,
see extract
.
The most important tuning parameter of componentwise gradient boosting
is the number of boosting iterations. It is recommended to use the number of
boosting iterations as only tuning parameter,
fixing the steplength at a small value (e.g., nu = 0.1).
Note that the default number of boosting iterations is 100 which is arbitrary and in most
cases not adequate (the optimal number of boosting iterations can considerably exceed 100).
The optimal stopping iteration can be determined by resampling methods like
crossvalidation or bootstrapping, see the function cvrisk.FDboost
which searches
the optimal stopping iteration on a grid, which in many cases has to be extended.
An object of class FDboost
that inherits from mboost
.
Special predict.FDboost
, coef.FDboost
and
plot.FDboost
methods are available.
The methods of mboost
are available as well,
e.g., extract
.
The FDboost
object is a named list containing:
... 
all elements of an 
yname 
the name of the response 
ydim 
dimension of the response matrix, if the response is represented as such 
yind 
the observation (time)points of the response, i.e. the evaluation points, with its name as attribute 
data 
the data that was used for the model fit 
id 
the id variable of the response 
predictOffset 
the function to predict the smooth offset 
offsetVec 
the offset for one trajectory if the response is represented as matrix and otherwise the offset for all trajectories 
offsetFDboost 
offset as specified in call to FDboost 
offsetMboost 
offset as given to mboost 
call 
the call to 
callEval 
the evaluated function call to 
numInt 
value of argument 
timeformula 
the timeformula 
formulaFDboost 
the formula with which 
formulaMboost 
the formula with which 
Sarah Brockhaus, Torsten Hothorn
Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015): The functional linear array model. Statistical Modelling, 15(3), 279300.
Brockhaus, S., Melcher, M., Leisch, F. and Greven, S. (2016): Boosting flexible functional regression models with a high number of functional historical effects, Statistics and Computing, accepted.
Currie, I.D., Durban, M. and Eilers P.H.C. (2006): Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series BStatistical Methodology, 68(2), 259280.
Scheipl, F., Staicu, A.M. and Greven, S. (2015): Functional Additive Mixed Models, Journal of Computational and Graphical Statistics, 24(2), 477501.
Note that FDboost calls mboost
directly.
See, e.g., bsignal
and bbsc
for possible baselearners.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196  ######## Example for functiononscalarregression
data("viscosity", package = "FDboost")
## set timeinterval that should be modeled
interval < "101"
## model time until "interval" and take log() of viscosity
end < which(viscosity$timeAll == as.numeric(interval))
viscosity$vis < log(viscosity$visAll[,1:end])
viscosity$time < viscosity$timeAll[1:end]
# with(viscosity, funplot(time, vis, pch = 16, cex = 0.2))
## fit median regression model with 100 boosting iterations,
## steplength 0.4 and smooth timespecific offset
## the factors are coded such that the effects are zero for each timepoint t
## no integration weights are used!
mod1 < FDboost(vis ~ 1 + bolsc(T_C, df = 2) + bolsc(T_A, df = 2),
timeformula = ~ bbs(time, df = 4),
numInt = "equal", family = QuantReg(),
offset = NULL, offset_control = o_control(k_min = 9),
data = viscosity, control=boost_control(mstop = 100, nu = 0.4))
## Not run:
#### find optimal mstop over 5fold bootstrap, small number of folds for example
#### do the resampling on the level of curves
## possibility 1: smooth offset and transformation matrices are refitted
set.seed(123)
appl1 < applyFolds(mod1, folds = cv(rep(1, length(unique(mod1$id))), B = 5),
grid = 1:500)
## plot(appl1)
mstop(appl1)
mod1[mstop(appl1)]
## possibility 2: smooth offset is refitted,
## computes oobrisk and the estimated coefficients on the folds
set.seed(123)
val1 < validateFDboost(mod1, folds = cv(rep(1, length(unique(mod1$id))), B = 5),
grid = 1:500)
## plot(val1)
mstop(val1)
mod1[mstop(val1)]
## possibility 3: very efficient
## using the function cvrisk; be careful to do the resampling on the level of curves
folds1 < cvLong(id = mod1$id, weights = model.weights(mod1), B = 5)
cvm1 < cvrisk(mod1, folds = folds1, grid = 1:500)
## plot(cvm1)
mstop(cvm1)
## End(Not run)
## look at the model
summary(mod1)
coef(mod1)
## plot(mod1)
## plotPredicted(mod1, lwdPred = 2)
######## Example for scalaronfunctionregression
data("fuelSubset", package = "FDboost")
## center the functional covariates per observed wavelength
fuelSubset$UVVIS < scale(fuelSubset$UVVIS, scale = FALSE)
fuelSubset$NIR < scale(fuelSubset$NIR, scale = FALSE)
## to make mboost:::df2lambda() happy (all design matrix entries < 10)
## reduce range of argvals to [0,1] to get smaller integration weights
fuelSubset$uvvis.lambda < with(fuelSubset, (uvvis.lambda  min(uvvis.lambda)) /
(max(uvvis.lambda)  min(uvvis.lambda) ))
fuelSubset$nir.lambda < with(fuelSubset, (nir.lambda  min(nir.lambda)) /
(max(nir.lambda)  min(nir.lambda) ))
## possibility 1: as FLAM model with the scalar response
## adds a penalty over the index of the response
## thus, mod2f and mod2 have different panlties
mod2f < FDboost(heatan ~ bsignal(UVVIS, uvvis.lambda, knots = 40, df = 4, check.ident = FALSE)
+ bsignal(NIR, nir.lambda, knots = 40, df = 4, check.ident = FALSE),
timeformula = ~bols(1), data = fuelSubset, control = boost_control(mstop = 200))
## possibility 2: with scalar response
mod2 < FDboost(heatan ~ bsignal(UVVIS, uvvis.lambda, knots = 40, df = 4, check.ident = FALSE)
+ bsignal(NIR, nir.lambda, knots = 40, df = 4, check.ident = FALSE),
timeformula = NULL, data = fuelSubset, control = boost_control(mstop = 200))
## Not run:
## bootstrap to find optimal mstop takes some time
set.seed(123)
folds2 < cv(weights = model.weights(mod2), B = 10)
cvm2 < cvrisk(mod2, folds = folds2, grid = 1:1000)
mstop(cvm2) ## mod2[327]
summary(mod2)
## plot(mod2)
## End(Not run)
## Example for functiononfunctionregression
if(require(fda)){
data("CanadianWeather", package = "fda")
CanadianWeather$l10precip < t(log(CanadianWeather$monthlyPrecip))
CanadianWeather$temp < t(CanadianWeather$monthlyTemp)
CanadianWeather$region < factor(CanadianWeather$region)
CanadianWeather$month.s < CanadianWeather$month.t < 1:12
## center the temperature curves per timepoint
CanadianWeather$temp < scale(CanadianWeather$temp, scale = FALSE)
rownames(CanadianWeather$temp) < NULL ## delete rownames
## fit model with cyclic splines over the year
mod3 < FDboost(l10precip ~ bols(region, df = 2.5, contrasts.arg = "contr.dummy")
+ bsignal(temp, month.s, knots = 11, cyclic = TRUE,
df = 2.5, boundary.knots = c(0.5,12.5), check.ident = FALSE),
timeformula = ~ bbs(month.t, knots = 11, cyclic = TRUE,
df = 3, boundary.knots = c(0.5, 12.5)),
offset = "scalar", offset_control = o_control(k_min = 5),
data = CanadianWeather)
## Not run:
#### find the optimal mstop over 5fold bootstrap
## using the function applyFolds
set.seed(123)
folds3 < cv(rep(1, length(unique(mod3$id))), B = 5)
appl3 < applyFolds(mod3, folds = folds3)
## use function cvrisk; be careful to do the resampling on the level of curves
set.seed(123)
folds3long < cvLong(id = mod3$id, weights = model.weights(mod3), B = 5)
cvm3 < cvrisk(mod3, folds = folds3long, grid = 1:500)
mstop(cvm3) ## mod3[64]
summary(mod3)
## plot(mod3, pers = TRUE)
## End(Not run)
}
######## Example for functional response observed on irregular grid
######## Delete part of observations in viscosity dataset
data("viscosity", package = "FDboost")
## set timeinterval that should be modeled
interval < "101"
## model time until "interval" and take log() of viscosity
end < which(viscosity$timeAll == as.numeric(interval))
viscosity$vis < log(viscosity$visAll[,1:end])
viscosity$time < viscosity$timeAll[1:end]
# with(viscosity, funplot(time, vis, pch = 16, cex = 0.2))
## only keep one eighth of the observation points
set.seed(123)
selectObs < sort(sample(x = 1:(64*46), size = 64*46/4, replace = FALSE))
dataIrregular < with(viscosity, list(vis = c(vis)[selectObs],
T_A = T_A, T_C = T_C,
time = rep(time, each = 64)[selectObs],
id = rep(1:64, 46)[selectObs]))
## fit median regression model with 50 boosting iterations,
## steplength 0.4 and smooth timespecific offset
## the factors are in effect coding 1, 1 for the levels
## no integration weights are used!
mod4 < FDboost(vis ~ 1 + bols(T_C, contrasts.arg = "contr.sum", intercept = FALSE)
+ bols(T_A, contrasts.arg = "contr.sum", intercept=FALSE),
timeformula = ~ bbs(time, lambda = 100), id = ~id,
numInt = "Riemann", family = QuantReg(),
offset = NULL, offset_control = o_control(k_min = 9),
data = dataIrregular, control = boost_control(mstop = 50, nu = 0.4))
summary(mod4)
## plot(mod4)
## plotPredicted(mod4, lwdPred = 2)
## Not run:
## Find optimal mstop, small grid/low B for a fast example
set.seed(123)
folds4 < cv(rep(1, length(unique(mod4$id))), B = 3)
appl4 < applyFolds(mod4, folds = folds4, grid = 1:50)
## val4 < validateFDboost(mod4, folds = folds4, grid = 1:50)
set.seed(123)
folds4long < cvLong(id = mod4$id, weights = model.weights(mod4), B = 3)
cvm4 < cvrisk(mod4, folds = folds4long, grid = 1:50)
mstop(cvm4)
## End(Not run)
## Be careful if you want to predict newdata with irregular response,
## as the argument index is not considered in the prediction of newdata.
## Thus, all covariates have to be repeated according to the number of observations
## in each response trajectroy.
## Predict four response curves with full timeobservations
## for the four combinations of T_A and T_C.
newd < list(T_A = factor(c(1,1,2,2), levels = 1:2,
labels = c("low", "high"))[rep(1:4, length(viscosity$time))],
T_C = factor(c(1,2,1,2), levels = 1:2,
labels = c("low", "high"))[rep(1:4, length(viscosity$time))],
time = rep(viscosity$time, 4))
pred < predict(mod4, newdata = newd)
## funplot(x = rep(viscosity$time, 4), y = pred, id = rep(1:4, length(viscosity$time)))

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.