mfboost | R Documentation |
An interface for model-based gradient boosting when response observations
are naturally represented as vectors, matrices,
or smooth potentially multidimensional functions.
The function is a wrapper for the function mboost
in the model-based boosting package mboost
via its functional pendant
FDboost
from the FDboost
package.
For manifold valued responses, like shapes, appropriate loss functions
can be fitted using an mfFamily
.
mfboost(formula, obj.formula = NULL, data = NULL, family = Gaussian(), ...)
formula |
a symbolic description of the model formula |
obj.formula |
intrinsic model formula for the internal representation
of the response. E.g., for a functional response y_i(t) typically use
|
data |
a list containing the variables in the model.
The response should be either provided as |
family |
an |
... |
additional arguments passed to |
While the response observations y_i and the corresponding predictions μ_i might live on a non-linear manifold M, they are modeled with an additive predictor η_i living in a linear space by
μ_i = g_p(η_i) = g_p(∑_j h_j(x_i))
where g_p is a response function, which might depend on a pole p \in M
and is typically chosen as g_p = Exp_p, the manifold exponential function
at p. The additive predictor η_i is composed of partial effects
h_j(x_i) as provided by the R packages mboost
and FDboost
,
but potentially constraining them to a respective linear subspace, e.g.
corresponding to the tangent space at p.
For further details on available covariate effects see
FDboost
and the baselearners
help of mboost
.
Computationally, it might make a huge difference whether response observations
are measured on a common regular grid or on irregular individual grids. In the
regular case, the linear array model can be utilized
(Brockhaus et al. 2015, Currie et al. 2006) for the design
matrix and the regular structure might also be used to speed up pole and
gradient computation.
This distinction is reflected - and controlled by - the data format and
in particular the format the response is provided in. For a data set with
N observations, data
should be
provided as list containing scalar covariate vectors (of length N).
The response should be contained as follows:
in the regular case, the response should be
a tbl_cube
with response
values as measures and remaining variables contained in
obj.formula
as dimensions, listed according to the covariates.
in the irregular case, the response should be a data.frame
(or list) with the response variables appearing in the obj.formula
.
for backward compatibility, data can also be in the format of
FDboost
, i.e., data
is again a list
and the response is provided in separate list elements:
in the regular case, the response measurements are provided as matrix
with N rows corresponding to the observations and the columns
containing all measurements in long format and the dim
and obj.formula
variables as vectors along the columns of the matrix.
in the irregular case, the columns of the last irregular option above are just separately added to the list.
An object of class mfboost
inheriting from FDboost
and
mboost
.
Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015): The functional linear array model. Statistical Modelling, 15(3), 279-300.
Currie, I.D., Durban, M. and Eilers P.H.C. (2006): Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B-Statistical Methodology, 68(2), 259-280.
mfFamily
,
factorize
, data example cells
# modeling the FORM/SIZE-AND-SHAPE of IRREGULAR curves ------------------- # load irregular cell data data("cells", package = "manifoldboost") # subsample (one for each covariate combination) cellsub <- as.data.frame(cells[-which(names(cells)=="response")]) cellsub$myd <- factor(with(cellsub, paste0("a=", a, " r=", r, " b=", b, " m=", m))) subids <- match(unique(cellsub$myd), cellsub$myd) cellsub <- as.list(cellsub[subids, ]) cellsub$response <- cells$response[as.numeric(cells$response$id) %in% subids, ] # fit model cell_model <- mfboost( formula = response ~ bbsc(a, df = 3, knots = 5) + bbsc(r, df = 3, knots = 5) + bbsc(b, df = 3, knots = 5) + bbsc(m, df = 3, knots = 5), obj.formula = value^dim ~ bbs(arg, df = 1, differences = 0, knots = 5, boundary.knots = c(0,1), cyclic = TRUE) | id, data = cellsub, family = PlanarSizeShapeL2( weight_fun = trapez_weights, arg_range = c(0,1)), control = boost_control(mstop = 300) ) # # cross-validation # set.seed(9382) # cell_cv <- cvrisk(cell_model, # folds = cvLong( # id = cell_model$id, # weights = cell_model$`(weights)`, # type = "kfold"), # grid = 0:mstop(cell_model)) # cell_model[mstop(cell_cv)] # plot first four predictions par(mfrow = c(2,2), mar = rep(2, 4) ) plot(cell_model, ids = 1:4, t = "l", main = cellsub$myd[1:4], seg_par = list(lty = "dashed")) legend(x = "bottomright", lty = c(1,1, 2), legend = c("intercept", "prediction", "point correspondence"), col = c("grey", "black", "grey")) # compare with data plot(cell_model, ids = 1:4, t = "l", y0_ = cell_model$family@mf$y_[1:4], main = cellsub$myd[1:4], seg_par = list(lty = "dashed")) legend(x = "bottomright", lty = c(1,1, 2), legend = c("observation", "prediction", "point correspondence"), col = c("grey", "black", "grey")) # predict dense cells on grids cellgrid <- cellsub cellgrid$response <- with(cellgrid$response, expand.grid( id = unique(id), arg = seq(0,1, len = 100), dim = unique(dim), value = NA)) cellgrid$response$value <- predict(cell_model, newdata = cellgrid, type = "response") # factorize effects cell_fac <- factorize(cell_model) vimp <- varimp(cell_fac$cov) plot(vimp, auto.key = FALSE) # plot two most important effect directions this <- cell_fac$cov$which(head(names(vimp)[order(vimp, decreasing = TRUE)], 2)) par(mfcol = c(2,2)) plot(cell_fac$resp, which = this, y0_par = list(type="l")) plot(cell_fac$cov, which = this) # modeling the SHAPE of REGULAR curves ------------------- # load regular cell data data("cellr", package = "manifoldboost") # subsample (one for each covariate combination) cellsub <- as.data.frame(cellr[-which(names(cellr)=="response")]) cellsub$myd <- factor(with(cellsub, paste0("a=", a, " r=", r, " b=", b, " m=", m))) subids <- match(unique(cellsub$myd), cellsub$myd) cellsub <- as.list(cellsub[subids, ]) cellsub$response <- cellr$response cellsub$response$dims$id <- ordered(cellsub$response$dims$id[subids], levels = unique(cellsub$response$dims$id[subids])) cellsub$response$mets$value <- cellsub$response$mets$value[,,subids] class(cellsub$response) <- "tbl_cube" # fit SHAPE model cell_model <- mfboost( formula = response ~ bbsc(a, df = 3, knots = 5) + bbsc(r, df = 3, knots = 5) + bbsc(b, df = 3, knots = 5) + bbsc(m, df = 3, knots = 5), obj.formula = value^dim ~ bbs(arg, df = 1, differences = 0, knots = 5, boundary.knots = c(0,70), cyclic = TRUE) | id, data = cellsub, family = PlanarShapeL2(), control = boost_control(mstop = 100) ) # # cross-validation # set.seed(8768) # cell_cv <- cvrisk(cell_model, # folds = cvMa(ydim = cell_model$ydim, # type = "kfold"), # grid = 0:mstop(cell_model)) # cell_model[mstop(cell_cv)] # plot first four predictions par(mfrow = c(2,2), mar = rep(2, 4) ) plot(cell_model, ids = 1:4, t = "l", main = cells$myd[1:4], seg_par = list(lty = "dashed")) legend(x = "bottomright", lty = c(1,1, 2), legend = c("intercept", "prediction", "point correspondence"), col = c("grey", "black", "grey")) # compare with data plot(cell_model, ids = 1:4, t = "l", y0_ = cell_model$family@mf$y_[1:4], main = cellsub$myd[1:4], seg_par = list(lty = "dashed")) legend(x = "bottomright", lty = c(1,1, 2), legend = c("observation", "prediction", "point correspondence"), col = c("grey", "black", "grey")) # predict dense cells on grids cellgrid <- cellsub cellgrid$response <- cubelyr::tbl_cube( dimensions = list( id = cellsub$response$dims$id, arg = seq(0,70, len = 100), dim = unique(cellsub$response$dims$dim) ), measures = list( value = array(NA, dim = c(29, 100, 2)))) cellgrid$response$mets$value <- array(predict(cell_model, newdata = cellgrid, type = "response"), dim = c(29,100,2)) for(i in 1:4) plot(cellgrid$response$mets$value[i,,], t = "l") # factorize effects cell_fac <- factorize(cell_model) vimp <- varimp(cell_fac$cov) plot(vimp, auto.key = FALSE) # plot two most important effect directions this <- cell_fac$cov$which(head(names(vimp)[order(vimp, decreasing = TRUE)], 2)) par(mfcol = c(2,2)) plot(cell_fac$resp, which = this, y0_par = list(type="l")) plot(cell_fac$cov, which = this)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.