mfboost: Model-based Boosting for Manifold Valued Object Data

View source: R/mfboost.R

mfboostR Documentation

Model-based Boosting for Manifold Valued Object Data

Description

An interface for model-based gradient boosting when response observations are naturally represented as vectors, matrices, or smooth potentially multidimensional functions. The function is a wrapper for the function mboost in the model-based boosting package mboost via its functional pendant FDboost from the FDboost package. For manifold valued responses, like shapes, appropriate loss functions can be fitted using an mfFamily.

Usage

mfboost(formula, obj.formula = NULL, data = NULL, family = Gaussian(), ...)

Arguments

formula

a symbolic description of the model formula y ~ ... on covariate level, i.e. specified as if the response was scalar, where y refers to an object (in data) containing all response data.

obj.formula

intrinsic model formula for the internal representation of the response. E.g., for a functional response y_i(t) typically use value ~ bbs(t) to obtain smooth effects over t, where value refers to the response evaluations (in y). See details for further information.

data

a list containing the variables in the model. The response should be either provided as tbl_cube (regular case) or as list of tbl_cubes or data.frames (irregular case). See details.

family

an mfFamily object.

...

additional arguments passed to FDboost/mboost.

Details

While the response observations y_i and the corresponding predictions μ_i might live on a non-linear manifold M, they are modeled with an additive predictor η_i living in a linear space by

μ_i = g_p(η_i) = g_p(∑_j h_j(x_i))

where g_p is a response function, which might depend on a pole p \in M and is typically chosen as g_p = Exp_p, the manifold exponential function at p. The additive predictor η_i is composed of partial effects h_j(x_i) as provided by the R packages mboost and FDboost, but potentially constraining them to a respective linear subspace, e.g. corresponding to the tangent space at p.

For further details on available covariate effects see FDboost and the baselearners help of mboost.

Computationally, it might make a huge difference whether response observations are measured on a common regular grid or on irregular individual grids. In the regular case, the linear array model can be utilized (Brockhaus et al. 2015, Currie et al. 2006) for the design matrix and the regular structure might also be used to speed up pole and gradient computation. This distinction is reflected - and controlled by - the data format and in particular the format the response is provided in. For a data set with N observations, data should be provided as list containing scalar covariate vectors (of length N). The response should be contained as follows:

  • in the regular case, the response should be a tbl_cube with response values as measures and remaining variables contained in obj.formula as dimensions, listed according to the covariates.

  • in the irregular case, the response should be a data.frame (or list) with the response variables appearing in the obj.formula.

  • for backward compatibility, data can also be in the format of FDboost, i.e., data is again a list and the response is provided in separate list elements:

    • in the regular case, the response measurements are provided as matrix with N rows corresponding to the observations and the columns containing all measurements in long format and the dim and obj.formula variables as vectors along the columns of the matrix.

    • in the irregular case, the columns of the last irregular option above are just separately added to the list.

Value

An object of class mfboost inheriting from FDboost and mboost.

Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015): The functional linear array model. Statistical Modelling, 15(3), 279-300.

Currie, I.D., Durban, M. and Eilers P.H.C. (2006): Generalized linear array models with applications to multidimensional smoothing. Journal of the Royal Statistical Society, Series B-Statistical Methodology, 68(2), 259-280.

See Also

mfFamily, factorize, data example cells

Examples

# modeling the FORM/SIZE-AND-SHAPE of IRREGULAR curves -------------------

# load irregular cell data
data("cells", package = "manifoldboost")

# subsample (one for each covariate combination)
cellsub <- as.data.frame(cells[-which(names(cells)=="response")])
cellsub$myd <- factor(with(cellsub, 
                           paste0("a=", a, " r=", r, " b=", b, " m=", m)))
subids <- match(unique(cellsub$myd), cellsub$myd)
cellsub <- as.list(cellsub[subids, ])
cellsub$response <- cells$response[as.numeric(cells$response$id) %in% subids, ]

# fit model
cell_model <- mfboost(
  formula = response ~ bbsc(a, df = 3, knots = 5) + 
    bbsc(r, df = 3, knots = 5) + 
    bbsc(b, df = 3, knots = 5) + 
    bbsc(m, df = 3, knots = 5),
  obj.formula = value^dim ~ 
    bbs(arg, df = 1, differences = 0, knots = 5, 
        boundary.knots = c(0,1), cyclic = TRUE) | id, 
  data = cellsub,
  family = PlanarSizeShapeL2(
    weight_fun = trapez_weights,
    arg_range = c(0,1)),
  control = boost_control(mstop = 300)
  )

# # cross-validation
# set.seed(9382)
# cell_cv <- cvrisk(cell_model,
#                           folds = cvLong(
#                             id = cell_model$id,
#                             weights = cell_model$`(weights)`,
#                             type = "kfold"),
#                           grid = 0:mstop(cell_model))
# cell_model[mstop(cell_cv)]

# plot first four predictions
par(mfrow = c(2,2), mar = rep(2, 4) )
plot(cell_model, ids = 1:4, t = "l", 
     main = cellsub$myd[1:4], 
     seg_par = list(lty = "dashed"))
legend(x = "bottomright", lty = c(1,1, 2),
       legend = c("intercept", "prediction", "point correspondence"), 
       col = c("grey", "black", "grey"))

# compare with data
plot(cell_model, ids = 1:4, t = "l", y0_ = cell_model$family@mf$y_[1:4], 
     main = cellsub$myd[1:4], 
     seg_par = list(lty = "dashed"))
legend(x = "bottomright", lty = c(1,1, 2),
       legend = c("observation", "prediction", "point correspondence"), 
       col = c("grey", "black", "grey"))

# predict dense cells on grids
cellgrid <- cellsub
cellgrid$response <- with(cellgrid$response, expand.grid(
  id = unique(id), 
  arg = seq(0,1, len = 100),
  dim = unique(dim),
  value = NA))
cellgrid$response$value <- predict(cell_model, 
                                   newdata = cellgrid, type = "response")


# factorize effects
cell_fac <- factorize(cell_model)

vimp <- varimp(cell_fac$cov)
plot(vimp, auto.key = FALSE)

# plot two most important effect directions
this <- cell_fac$cov$which(head(names(vimp)[order(vimp, decreasing = TRUE)], 2))
par(mfcol = c(2,2))
plot(cell_fac$resp, which = this, y0_par = list(type="l"))
plot(cell_fac$cov, which = this)




# modeling the SHAPE of REGULAR curves -------------------

# load regular cell data
data("cellr", package = "manifoldboost")

# subsample (one for each covariate combination)
cellsub <- as.data.frame(cellr[-which(names(cellr)=="response")])
cellsub$myd <- factor(with(cellsub,
                           paste0("a=", a, " r=", r, " b=", b, " m=", m)))
subids <- match(unique(cellsub$myd), cellsub$myd)
cellsub <- as.list(cellsub[subids, ])
cellsub$response <- cellr$response
cellsub$response$dims$id <- ordered(cellsub$response$dims$id[subids], 
                                    levels = unique(cellsub$response$dims$id[subids]))
cellsub$response$mets$value <- cellsub$response$mets$value[,,subids]
class(cellsub$response) <- "tbl_cube" 

# fit SHAPE model
cell_model <- mfboost(
  formula = response ~ bbsc(a, df = 3, knots = 5) + 
    bbsc(r, df = 3, knots = 5) + 
    bbsc(b, df = 3, knots = 5) + 
    bbsc(m, df = 3, knots = 5),
  obj.formula = value^dim ~ 
    bbs(arg, df = 1, differences = 0, knots = 5, 
        boundary.knots = c(0,70), cyclic = TRUE) | id, 
  data = cellsub,
  family = PlanarShapeL2(),
  control = boost_control(mstop = 100)
  )

# # cross-validation
# set.seed(8768)
# cell_cv <- cvrisk(cell_model,
#                    folds = cvMa(ydim = cell_model$ydim,
#                                 type = "kfold"),
#                    grid = 0:mstop(cell_model))
# cell_model[mstop(cell_cv)]

# plot first four predictions
par(mfrow = c(2,2), mar = rep(2, 4) )
plot(cell_model, ids = 1:4, t = "l", 
     main = cells$myd[1:4], 
     seg_par = list(lty = "dashed"))
legend(x = "bottomright", lty = c(1,1, 2),
       legend = c("intercept", "prediction", "point correspondence"), 
       col = c("grey", "black", "grey"))

# compare with data
plot(cell_model, ids = 1:4, t = "l", y0_ = cell_model$family@mf$y_[1:4], 
     main = cellsub$myd[1:4], 
     seg_par = list(lty = "dashed"))
legend(x = "bottomright", lty = c(1,1, 2),
       legend = c("observation", "prediction", "point correspondence"), 
       col = c("grey", "black", "grey"))

# predict dense cells on grids
cellgrid <- cellsub
cellgrid$response <- cubelyr::tbl_cube(
  dimensions = list(
    id = cellsub$response$dims$id, 
    arg = seq(0,70, len = 100),
    dim = unique(cellsub$response$dims$dim)
  ),
  measures = list(
    value = array(NA, dim = c(29, 100, 2))))

cellgrid$response$mets$value <- array(predict(cell_model, 
                                   newdata = cellgrid, type = "response"), 
                                   dim = c(29,100,2))

for(i in 1:4) 
  plot(cellgrid$response$mets$value[i,,], t = "l")

# factorize effects
cell_fac <- factorize(cell_model)

vimp <- varimp(cell_fac$cov)
plot(vimp, auto.key = FALSE)

# plot two most important effect directions
this <- cell_fac$cov$which(head(names(vimp)[order(vimp, decreasing = TRUE)], 2))
par(mfcol = c(2,2))
plot(cell_fac$resp, which = this, y0_par = list(type="l"))
plot(cell_fac$cov, which = this)



Almond-S/manifoldboost documentation built on June 23, 2022, 11:06 a.m.