Model-based Gradient Boosting for Functional Response

Description

Gradient boosting for optimizing arbitrary loss functions, where component-wise models are utilized as base-learners in the case of functional responses. Scalar responses are treated as the special case where each functional response has only one observation. This function is a wrapper for mboost's mboost and its siblings to fit models of the general form

ξ(Y_i(t) | X_i = x_i) = ∑_{j} h_j(x_i, t), i = 1, ..., N,

with a functional (but not necessarily continuous) response Y(t), transformation function ξ, e.g., the expectation, the median or some quantile, and partial effects h_j(x_i, t) depending on covariates x_i and the current index of the response t. The index of the response can be for example time. Possible effects are, e.g., a smooth intercept β_0(t), a linear functional effect \int x_i(s)β(s,t)ds, potentially with integration limits depending on t, smooth and linear effects of scalar covariates f(z_i) or z_i β(t).

Usage

1
2
3
FDboost(formula, timeformula, id = NULL, numInt = "equal", data,
  weights = NULL, offset = NULL, offset_control = o_control(),
  check0 = FALSE, ...)

Arguments

formula

a symbolic description of the model to be fit. Per default no intercept is added, only a smooth offset, see argument offset. To add a smooth intercept, use 1, e.g., y ~ 1 for a pure intercept model.

timeformula

one-sided formula for the specification of the effect over the index of the response. For functional response Y_i(t) typically use ~ bbs(t) to obtain smooth effects over t. In the limiting case of Y_i being a scalar response, use ~ bols(1), which sets up a base-learner for the scalar 1. Or use timeformula = NULL, then the scalar response is treated as scalar.

id

defaults to NULL which means that all response trajectories are observed on a common grid allowing to represent the response as a matrix. If the response is given in long format for observation-specific grids, id contains the information which observations belong to the same trajectory and must be supplied as a formula, ~ nameid, where the variable nameid should contain integers 1, 2, 3, ..., N.

numInt

integration scheme for the integration of the loss function. One of c("equal", "Riemann") meaning equal weights of 1 or trapezoidal Riemann weights. Alternatively a vector of length nrow(response) containing positive weights can be specified.

data

a data frame or list containing the variables in the model.

weights

(1) a numeric vector of weights for observational units, i.e. length(weights) has to be nrow(response), (2) alternatively weights can be specified for single observations then length(weights) has to be nrow(response)*ncol(response) per default weights is constantly 1.

offset

a numeric vector to be used as offset over the index of the response (optional). If no offset is specified, per default offset = NULL which means that a smooth time-specific offset is computed and used before the model fit to center the data. If you do not want to use a time-specific offset, set offset = "scalar" to get an overall scalar offset, like in mboost.

offset_control

parameters for the estimation of the offset, defaults to o_control(k_min = 20, silent = TRUE), see o_control.

check0

logical, for response in matrix form, i.e. response that is observed on a common grid, check the fitted effects for the sum-to-zero constraint h_j(x_i)(t) = 0 for all t and give a warning if it is not fulfilled. Defaults to FALSE.

...

additional arguments passed to mboost, including, family and control.

Details

In matrix representation of functional response and covariates each row represents one functional observation, e.g., Y[i,t_g] corresponds to Y_i(t_g), giving a <number of curves> by <number of evaluations> matrix. For the model fit, the matrix of the functional response evaluations Y_i(t_g) are stacked internally into one long vector.

If it is possible to represent the model as a generalized linear array model (Currie et al., 2006), the array structure is used for an efficient implementation, see mboost. This is only possible if the design matrix can be written as the Kronecker product of two marginal design matrices yielding a functional linear array model (FLAM), see Brockhaus et al. (2015) for details. The Kronecker product of two marginal bases is implemented in R-package mboost in the function %O%, see %O%.

When %O% is called with a specification of df in both base-learners, e.g., bbs(x1, df = df1) %O% bbs(t, df = df2), the global df for the Kroneckered base-learner is computed as df = df1 * df2. And thus the penalty has only one smoothness parameter lambda resulting in an isotropic penalty. A Kronecker product with anisotropic penalty is %A%, allowing for different amount of smoothness in the two directions, see %A%. If the formula contains base-learners connected by %O%, %A% or %A0%, those effects are not expanded with timeformula, allowing for model specifications with different effects in time-direction.

If the response is observed on curve-specific grids it must be supplied as a vector in long format and the argument id has to be specified (as formula!) to define which observations belong to which curve. In this case the base-learners are built as row tensor-products of marginal base-learners, see Scheipl et al. (2015) and Brockhaus et al. (2016), for details on how to set up the effects. The row tensor product of two marginal bases is implemented in R-package mboost in the function %X%, see %X%.

A scalar response can be seen as special case of a functional response with only one time-point, and thus it can be represented as FLAM with basis 1 in time-direction, use timeformula = ~bols(1). In this case, a penalty in the time-direction is used, see Brockhaus et al. (2015) for details. Alternatively, the scalar response is fitted as scalar response, like in the function mboost in package mboost. The advantage of using FDboost in that case is that methods for the functional base-learners are available, e.g., plot.

The desired regression type is specified by the family-argument, see the help-page of mboost. For example a mean regression model is obtained by family = Gaussian() which is the default or median regression by family = QuantReg(); see Family for a list of implemented families.

With FDboost the following covariate effects can be estimated by specifying the following effects in the formula (similar to function pffr in R-package refund). The timeformula is used to expand the effects in t-direction.

  • Linear functional effect of scalar (numeric or factor) covariate z that varies smoothly over t, i.e. z_i β(t), specified as bolsc(z), see bolsc, or for a group effect with mean zero use brandomc(z).

  • Nonlinear effects of a scalar covariate that vary smoothly over t, i.e. f(z_i, t), specified as bbsc(z), see bbsc.

  • (Nonlinear) effects of scalar covariates that are constant over t, e.g., f(z_i), specified as c(bbs(z)), or β z_i, specified as c(bols(z)).

  • Interaction terms between two scalar covariates, e.g., z_i1 zi2 β(t), are specified as bols(z1) %Xc% bols(z2) and an interaction z_i1 f(zi2, t) as bols(z1) %Xc% bbs(z2), as %Xc% applies the sum-to-zero constraint to the desgin matrix of the tensor product built by %Xc%, see %Xc%, EXPERIMENTAL!

  • Function-on-function regression terms of functional covariates x, e.g., \int x_i(s)β(s,t)ds, specified as bsignal(x, s = s), using P-splines, see bsignal. Terms given by bfpc provide FPC-based effects of functional covariates, see bfpc.

  • Function-on-function regression terms of functional covariates x with integration limits [l(t), u(t)] depending on t, e.g., \int_[l(t), u(t)] x_i(s)β(s,t)ds, specified as bhist(x, s = s, time = t, limits). The limits argument defaults to "s<=t" which yields a historical effect with limits [min(t),t], see bhist.

  • Concurrent effects of functional covariates x measured on the same grid as the response, i.e., x_i(s)β(t), are specified as bconcurrent(x, s = s, time = t), see bconcurrent.

  • Interaction effects can be estimated as tensor product smooth, e.g., z \int x_i(s)β(s,t)ds as bsignal(x, s = s) %X% bolsc(z)

  • For interaction effects with historical functional effects, e.g., z_i \int_[l(t),u(t)] x_i(s)β(s,t)ds the base-learner bhistx should be used instead of bhist, e.g., bhistx(x, limits) %X% bolsc(z), see bhistx.

  • Generally, the c()-notation can be used to get effects that are constant over the index of the functional response.

  • If the formula in FDboost contains base-learners connected by %O%, %A% or %A0%, those effects are not expanded with timeformula, allowing for model specifications with different effects in time-direction.

In order to obtain a fair selection of base-learners, the same degrees of freedom (df) should be specified for all baselearners. If the number of df differs among the base-learners, the selection is biased towards more flexible base-learners with higher df as they are more likely to yield larger improvements of the fit. It is recommended to use a rather small number of df for all base-learners. It is not possible to specify df larger than the rank of the design matrix. For base-learners with rank-deficient penalty, it is not possible to specify df smaller than the rank of the null space of the penalty (e.g., in bbs unpenalized part of P-splines). The df of the base-learners in an FDboost-object can be checked using extract(object, "df"), see extract.

The most important tuning parameter of component-wise gradient boosting is the number of boosting iterations. It is recommended to use the number of boosting iterations as only tuning parameter, fixing the step-length at a small value (e.g., nu = 0.1). Note that the default number of boosting iterations is 100 which is arbitrary and in most cases not adequate (the optimal number of boosting iterations can considerably exceed 100). The optimal stopping iteration can be determined by resampling methods like cross-validation or bootstrapping, see the function cvrisk.FDboost which searches the optimal stopping iteration on a grid, which in many cases has to be extended.

Value

An object of class FDboost that inherits from mboost. Special predict.FDboost, coef.FDboost and plot.FDboost methods are available. The methods of mboost are available as well, e.g., extract.

The FDboost-object is a named list containing:

...

all elements of an mboost-object

yname

the name of the response

ydim

dimension of the response matrix, if the response is represented as such

yind

the observation (time-)points of the response, i.e. the evaluation points, with its name as attribute

data

the data that was used for the model fit

id

the id variable of the response

predictOffset

the function to predict the smooth offset

offsetVec

the offset for one trajectory if the response is represented as matrix and otherwise the offset for all trajectories

offsetFDboost

offset as specified in call to FDboost

offsetMboost

offset as given to mboost

call

the call to FDboost

callEval

the evaluated function call to FDboost without data

numInt

value of argument numInt determining the numerical integration scheme

timeformula

the time-formula

formulaFDboost

the formula with which FDboost was called

formulaMboost

the formula with which mboost was called within FDboost

Author(s)

Sarah Brockhaus, Torsten Hothorn

References

Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015): The functional linear array model. Statistical Modelling, 15(3), 279-300.

Brockhaus, S., Melcher, M., Leisch, F. and Greven, S. (2016): Boosting flexible functional regression models with a high number of functional historical effects, Statistics and Computing, accepted.

Currie, I.D., Durban, M. and Eilers P.H.C. (2006): Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B-Statistical Methodology, 68(2), 259-280.

Scheipl, F., Staicu, A.-M. and Greven, S. (2015): Functional Additive Mixed Models, Journal of Computational and Graphical Statistics, 24(2), 477-501.

See Also

Note that FDboost calls mboost directly. See, e.g., bsignal and bbsc for possible base-learners.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
######## Example for function-on-scalar-regression
data("viscosity", package = "FDboost")
## set time-interval that should be modeled
interval <- "101"

## model time until "interval" and take log() of viscosity
end <- which(viscosity$timeAll == as.numeric(interval))
viscosity$vis <- log(viscosity$visAll[,1:end])
viscosity$time <- viscosity$timeAll[1:end]
# with(viscosity, funplot(time, vis, pch = 16, cex = 0.2))

## fit median regression model with 100 boosting iterations,
## step-length 0.4 and smooth time-specific offset
## the factors are coded such that the effects are zero for each timepoint t
## no integration weights are used!
mod1 <- FDboost(vis ~ 1 + bolsc(T_C, df = 2) + bolsc(T_A, df = 2),
               timeformula = ~ bbs(time, df = 4),
               numInt = "equal", family = QuantReg(),
               offset = NULL, offset_control = o_control(k_min = 9),
               data = viscosity, control=boost_control(mstop = 100, nu = 0.4))

## Not run: 
  #### find optimal mstop over 5-fold bootstrap, small number of folds for example
  #### do the resampling on the level of curves

  ## possibility 1: smooth offset and transformation matrices are refitted
  set.seed(123)
  appl1 <- applyFolds(mod1, folds = cv(rep(1, length(unique(mod1$id))), B = 5),
                      grid = 1:500)
  ## plot(appl1)
  mstop(appl1)
  mod1[mstop(appl1)]

  ## possibility 2: smooth offset is refitted,
  ## computes oob-risk and the estimated coefficients on the folds
  set.seed(123)
  val1 <- validateFDboost(mod1, folds = cv(rep(1, length(unique(mod1$id))), B = 5),
                        grid = 1:500)
  ## plot(val1)
  mstop(val1)
  mod1[mstop(val1)]

  ## possibility 3: very efficient
  ## using the function cvrisk; be careful to do the resampling on the level of curves
  folds1 <- cvLong(id = mod1$id, weights = model.weights(mod1), B = 5)
  cvm1 <- cvrisk(mod1, folds = folds1, grid = 1:500)
  ## plot(cvm1)
  mstop(cvm1)

## End(Not run)

## look at the model
summary(mod1)
coef(mod1)
## plot(mod1)
## plotPredicted(mod1, lwdPred = 2)

######## Example for scalar-on-function-regression
data("fuelSubset", package = "FDboost")

## center the functional covariates per observed wavelength
fuelSubset$UVVIS <- scale(fuelSubset$UVVIS, scale = FALSE)
fuelSubset$NIR <- scale(fuelSubset$NIR, scale = FALSE)

## to make mboost:::df2lambda() happy (all design matrix entries < 10)
## reduce range of argvals to [0,1] to get smaller integration weights
fuelSubset$uvvis.lambda <- with(fuelSubset, (uvvis.lambda - min(uvvis.lambda)) /
                                          (max(uvvis.lambda) - min(uvvis.lambda) ))
fuelSubset$nir.lambda <- with(fuelSubset, (nir.lambda - min(nir.lambda)) /
                                          (max(nir.lambda) - min(nir.lambda) ))

## possibility 1: as FLAM model with the scalar response
## adds a penalty over the index of the response
## thus, mod2f and mod2 have different panlties
mod2f <- FDboost(heatan ~ bsignal(UVVIS, uvvis.lambda, knots = 40, df = 4, check.ident = FALSE)
               + bsignal(NIR, nir.lambda, knots = 40, df = 4, check.ident = FALSE),
               timeformula = ~bols(1), data = fuelSubset, control = boost_control(mstop = 200))

## possibility 2: with scalar response
mod2 <- FDboost(heatan ~ bsignal(UVVIS, uvvis.lambda, knots = 40, df = 4, check.ident = FALSE)
               + bsignal(NIR, nir.lambda, knots = 40, df = 4, check.ident = FALSE),
               timeformula = NULL, data = fuelSubset, control = boost_control(mstop = 200))
## Not run: 
  ## bootstrap to find optimal mstop takes some time
  set.seed(123)
  folds2 <- cv(weights = model.weights(mod2), B = 10)
  cvm2 <- cvrisk(mod2, folds = folds2, grid = 1:1000)
  mstop(cvm2) ## mod2[327]
  summary(mod2)
  ## plot(mod2)

## End(Not run)

## Example for function-on-function-regression
if(require(fda)){

  data("CanadianWeather", package = "fda")
  CanadianWeather$l10precip <- t(log(CanadianWeather$monthlyPrecip))
  CanadianWeather$temp <- t(CanadianWeather$monthlyTemp)
  CanadianWeather$region <- factor(CanadianWeather$region)
  CanadianWeather$month.s <- CanadianWeather$month.t <- 1:12

  ## center the temperature curves per time-point
  CanadianWeather$temp <- scale(CanadianWeather$temp, scale = FALSE)
  rownames(CanadianWeather$temp) <- NULL ## delete row-names

  ## fit model with cyclic splines over the year
  mod3 <- FDboost(l10precip ~ bols(region, df = 2.5, contrasts.arg = "contr.dummy")
                   + bsignal(temp, month.s, knots = 11, cyclic = TRUE,
                           df = 2.5, boundary.knots = c(0.5,12.5), check.ident = FALSE),
                   timeformula = ~ bbs(month.t, knots = 11, cyclic = TRUE,
                                  df = 3, boundary.knots = c(0.5, 12.5)),
                   offset = "scalar", offset_control = o_control(k_min = 5),
                  data = CanadianWeather)

 ## Not run: 
   #### find the optimal mstop over 5-fold bootstrap
   ## using the function applyFolds
   set.seed(123)
   folds3 <- cv(rep(1, length(unique(mod3$id))), B = 5)
   appl3 <- applyFolds(mod3, folds = folds3)

   ## use function cvrisk; be careful to do the resampling on the level of curves
   set.seed(123)
   folds3long <- cvLong(id = mod3$id, weights = model.weights(mod3), B = 5)
   cvm3 <- cvrisk(mod3, folds = folds3long, grid = 1:500)
   mstop(cvm3) ## mod3[64]

   summary(mod3)
   ## plot(mod3, pers = TRUE)
 
## End(Not run)
}

######## Example for functional response observed on irregular grid
######## Delete part of observations in viscosity data-set
data("viscosity", package = "FDboost")
## set time-interval that should be modeled
interval <- "101"

## model time until "interval" and take log() of viscosity
end <- which(viscosity$timeAll == as.numeric(interval))
viscosity$vis <- log(viscosity$visAll[,1:end])
viscosity$time <- viscosity$timeAll[1:end]
# with(viscosity, funplot(time, vis, pch = 16, cex = 0.2))

## only keep one eighth of the observation points
set.seed(123)
selectObs <- sort(sample(x = 1:(64*46), size = 64*46/4, replace = FALSE))
dataIrregular <- with(viscosity, list(vis = c(vis)[selectObs],
                                      T_A = T_A, T_C = T_C,
                                      time = rep(time, each = 64)[selectObs],
                                      id = rep(1:64, 46)[selectObs]))

## fit median regression model with 50 boosting iterations,
## step-length 0.4 and smooth time-specific offset
## the factors are in effect coding -1, 1 for the levels
## no integration weights are used!
mod4 <- FDboost(vis ~ 1 + bols(T_C, contrasts.arg = "contr.sum", intercept = FALSE)
                + bols(T_A, contrasts.arg = "contr.sum", intercept=FALSE),
                timeformula = ~ bbs(time, lambda = 100), id = ~id,
                numInt = "Riemann", family = QuantReg(),
                offset = NULL, offset_control = o_control(k_min = 9),
                data = dataIrregular, control = boost_control(mstop = 50, nu = 0.4))
summary(mod4)
## plot(mod4)
## plotPredicted(mod4, lwdPred = 2)

## Not run: 
  ## Find optimal mstop, small grid/low B for a fast example
  set.seed(123)
  folds4 <- cv(rep(1, length(unique(mod4$id))), B = 3)
  appl4 <- applyFolds(mod4, folds = folds4, grid = 1:50)
  ## val4 <- validateFDboost(mod4, folds = folds4, grid = 1:50)

  set.seed(123)
  folds4long <- cvLong(id = mod4$id, weights = model.weights(mod4), B = 3)
  cvm4 <- cvrisk(mod4, folds = folds4long, grid = 1:50)
  mstop(cvm4)

## End(Not run)

## Be careful if you want to predict newdata with irregular response,
## as the argument index is not considered in the prediction of newdata.
## Thus, all covariates have to be repeated according to the number of observations
## in each response trajectroy.
## Predict four response curves with full time-observations
## for the four combinations of T_A and T_C.
newd <- list(T_A = factor(c(1,1,2,2), levels = 1:2,
                        labels = c("low", "high"))[rep(1:4, length(viscosity$time))],
             T_C = factor(c(1,2,1,2), levels = 1:2,
                        labels = c("low", "high"))[rep(1:4, length(viscosity$time))],
             time = rep(viscosity$time, 4))

pred <- predict(mod4, newdata = newd)
## funplot(x = rep(viscosity$time, 4), y = pred, id = rep(1:4, length(viscosity$time)))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.