Nothing
#################################################################################
#' Model-based Gradient Boosting for Functional GAMLSS
#'
#' Function for fitting generalized additive models for location, scale and shape (GAMLSS)
#' with functional data using component-wise gradient boosting, for details see
#' Brockhaus et al. (2018).
#'
#' @param formula a symbolic description of the model to be fit.
#' If \code{formula} is a single formula, the same formula is used for all distribution parameters.
#' \code{formula} can also be a (named) list, where each list element corresponds to one distribution
#' parameter of the GAMLSS distribution. The names must be the same as in the \code{families}.
#' @param timeformula one-sided formula for the expansion over the index of the response.
#' For a functional response \eqn{Y_i(t)} typically \code{~bbs(t)} to obtain a smooth
#' expansion of the effects along \code{t}. In the limiting case that \eqn{Y_i} is a scalar response
#' use \code{~bols(1)}, which sets up a base-learner for the scalar 1.
#' Or you can use \code{timeformula=NULL}, then the scalar response is treated as scalar.
#' Analogously to \code{formula}, \code{timeformula} can either be a one-sided formula or
#' a named list of one-sided formulas.
#' @param data a data frame or list containing the variables in the model.
#' @param families an object of class \code{families}. It can be either one of the pre-defined distributions
#' that come along with the package \code{gamboostLSS} or a new distribution specified by the user
#' (see \code{\link{Families}} for details).
#' Per default, the two-parametric \code{\link{GaussianLSS}} family is used.
#' @param control a list of parameters controlling the algorithm.
#' For more details see \code{\link{boost_control}}.
#' @param weights does not work!
#' @param method fitting method, currently two methods are supported:
#' \code{"cyclic"} (see Mayr et al., 2012) and \code{"noncyclic"}
#' (algorithm with inner loss of Thomas et al., 2018).
#' @param ... additional arguments passed to \code{\link[FDboost]{FDboost}},
#' including, \code{family} and \code{control}.
#'
#' @details For details on the theory of GAMLSS, see Rigby and Stasinopoulos (2005).
#' \code{FDboostLSS} calls \code{FDboost} to fit the distribution parameters of a GAMLSS -
#' a functional boosting model is fitted for each parameter of the response distribution.
#' In \code{\link[gamboostLSS]{mboostLSS}}, details on boosting of GAMLSS based on
#' Mayr et al. (2012) and Thomas et al. (2018) are given.
#' In \code{\link{FDboost}}, details on boosting regression models with functional variables
#' are given (Brockhaus et al., 2015, Brockhaus et al., 2017).
#'
#' @return An object of class \code{FDboostLSS} that inherits from \code{mboostLSS}.
#' The \code{FDboostLSS}-object is a named list containing one list entry per distribution parameter
#' and some attributes. The list is named like the parameters, e.g. mu and sigma,
#' if the parameters mu and sigma are modeled. Each list-element is an object of class \code{FDboost}.
#'
#' @author Sarah Brockhaus
#'
#' @seealso Note that \code{FDboostLSS} calls \code{\link{FDboost}} directly.
#'
#' @keywords models regression nonlinear smooth
#'
#' @references
#' Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015).
#' The functional linear array model. Statistical Modelling, 15(3), 279-300.
#'
#' Brockhaus, S., Melcher, M., Leisch, F. and Greven, S. (2017):
#' Boosting flexible functional regression models with a high number of functional historical effects,
#' Statistics and Computing, 27(4), 913-926.
#'
#' Brockhaus, S., Fuest, A., Mayr, A. and Greven, S. (2018):
#' Signal regression models for location, scale and shape with an application to stock returns.
#' Journal of the Royal Statistical Society: Series C (Applied Statistics), 67, 665-686.
#'
#' Mayr, A., Fenske, N., Hofner, B., Kneib, T. and Schmid, M. (2012):
#' Generalized additive models for location, scale and shape for high-dimensional
#' data - a flexible approach based on boosting.
#' Journal of the Royal Statistical Society: Series C (Applied Statistics), 61(3), 403-427.
#'
#' Rigby, R. A. and D. M. Stasinopoulos (2005):
#' Generalized additive models for location, scale and shape (with discussion).
#' Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(3), 507-554.
#'
#' Thomas, J., Mayr, A., Bischl, B., Schmid, M., Smith, A., and Hofner, B. (2018),
#' Gradient boosting for distributional regression - faster tuning and improved
#' variable selection via noncyclical updates.
#' Statistics and Computing, 28, 673-687.
#'
#' Stoecker, A., Brockhaus, S., Schaffer, S., von Bronk, B., Opitz, M., and Greven, S. (2019):
#' Boosting Functional Response Models for Location, Scale and Shape with an Application to Bacterial Competition.
#' \url{https://arxiv.org/abs/1809.09881}
#'
#' @examples
#' ########### simulate Gaussian scalar-on-function data
#' n <- 500 ## number of observations
#' G <- 120 ## number of observations per functional covariate
#' set.seed(123) ## ensure reproducibility
#' z <- runif(n) ## scalar covariate
#' z <- z - mean(z)
#' s <- seq(0, 1, l=G) ## index of functional covariate
#' ## generate functional covariate
#' if(require(splines)){
#' x <- t(replicate(n, drop(bs(s, df = 5, int = TRUE) %*% runif(5, min = -1, max = 1))))
#' }else{
#' x <- matrix(rnorm(n*G), ncol = G, nrow = n)
#' }
#' x <- scale(x, center = TRUE, scale = FALSE) ## center x per observation point
#'
#' mu <- 2 + 0.5*z + (1/G*x) %*% sin(s*pi)*5 ## true functions for expectation
#' sigma <- exp(0.5*z - (1/G*x) %*% cos(s*pi)*2) ## for standard deviation
#'
#' y <- rnorm(mean = mu, sd = sigma, n = n) ## draw respone y_i ~ N(mu_i, sigma_i)
#'
#' ## save data as list containing s as well
#' dat_list <- list(y = y, z = z, x = I(x), s = s)
#'
#' ## model fit with noncyclic algorithm assuming Gaussian location scale model
#' m_boost <- FDboostLSS(list(mu = y ~ bols(z, df = 2) + bsignal(x, s, df = 2, knots = 16),
#' sigma = y ~ bols(z, df = 2) + bsignal(x, s, df = 2, knots = 16)),
#' timeformula = NULL, data = dat_list, method = "noncyclic")
#' summary(m_boost)
#'
#' \donttest{
#' if(require(gamboostLSS)){
#' ## find optimal number of boosting iterations on a grid in 1:1000
#' ## using 5-fold bootstrap
#' ## takes some time, easy to parallelize on Linux
#' set.seed(123)
#' cvr <- cvrisk(m_boost, folds = cv(model.weights(m_boost[[1]]), B = 5),
#' grid = 1:1000, trace = FALSE)
#' ## use model at optimal stopping iterations
#' m_boost <- m_boost[mstop(cvr)] ## 832
#'
#' ## plot smooth effects of functional covariates for mu and sigma
#' oldpar <- par(mfrow = c(1,2))
#' plot(m_boost$mu, which = 2, ylim = c(0,5))
#' lines(s, sin(s*pi)*5, col = 3, lwd = 2)
#' plot(m_boost$sigma, which = 2, ylim = c(-2.5,2.5))
#' lines(s, -cos(s*pi)*2, col = 3, lwd = 2)
#' par(oldpar)
#' }
#' }
#' @export
## function that calls FDboost for each distribution parameter
FDboostLSS <- function(formula, timeformula, data = list(), families = GaussianLSS(),
control = boost_control(), weights = NULL,
method = c("cyclic", "noncyclic"), ...){
cl <- match.call()
if(is.null(cl$families)) cl$families <- families
## warnings for functional response are irrelevant for scalar response
if( !is.null(timeformula) && timeformula != ~bols(1) ){
message("No smooth offsets over time are used, just global scalar offsets.")
message("No integration weights are used to compute the loss for the functional response.")
}
method <- match.arg(method)
fit <- mboostLSS_fit(formula = formula, timeformula = timeformula,
data = data, families = families,
control = control, weights = weights, ...,
fun = FDboost, funchar = "FDboost", call = cl,
method = method)
## make sure that the first class of the model object is 'FDboostLSS'
class(fit) <- class(fit)[class(fit) != "FDboostLSS"]
class(fit) <- c("FDboostLSS", class(fit))
return(fit)
}
#################################################################################
#' Cross-validation for FDboostLSS
#'
#' Multidimensional cross-validated estimation of the empirical risk for hyper-parameter selection,
#' for an object of class \code{FDboostLSS} setting the folds per default to resampling curves.
#'
#' @param object an object of class \code{FDboostLSS}.
#' @param folds a weight matrix a weight matrix with number of rows equal to the number of observations.
#' The number of columns corresponds to the number of cross-validation runs,
#' defaults to 25 bootstrap samples, resampling whole curves
#' @param grid defaults to a grid up to the current number of boosting iterations.
#' The default generates the grid according to the defaults of
#' \code{\link{cvrisk.mboostLSS}} which are different for models with cyclic or noncyclic fitting.
#' @param papply (parallel) apply function, defaults to \code{\link{mclapply}},
#' see \code{\link{cvrisk.mboostLSS}} for details.
#' @param trace print status information during cross-validation? Defaults to \code{TRUE}.
#' @param fun if \code{fun} is \code{NULL}, the out-of-sample risk is returned.
#' \code{fun}, as a function of \code{object},
#' may extract any other characteristic of the cross-validated models. These are returned as is.
#' @param ... additional arguments passed to \code{\link{mclapply}}.
#'
#' @details The function \code{cvrisk.FDboostLSS} is a wrapper for
#' \code{cvrisk.mboostLSS} in package \code{gamboostLSS}.
#' It overrides the default for the folds, so that the folds are sampled on the level of curves
#' (not on the level of single observations, which does not make sense for functional response).
#'
#' @return An object of class \code{cvriskLSS} (when \code{fun} was not specified),
#' basically a matrix containing estimates of the empirical risk for a varying number
#' of bootstrap iterations. \code{plot} and \code{print} methods are available as well as an
#' \code{mstop} method, see \code{\link{cvrisk.mboostLSS}}.
#'
#' @seealso \code{\link{cvrisk.mboostLSS}} in
#' package \code{gamboostLSS}.
#'
#' @export
## wrapper for cvrisk of gamboostLSS, specifying folds on the level of curves
cvrisk.FDboostLSS <- function(object, folds = cvLong(id = object[[1]]$id,
weights = model.weights(object[[1]])),
grid = NULL,
papply = mclapply, trace = TRUE,
fun = NULL, ...){
## message not necessary as currently only a scalar offset is possible for FDboostLSS-models
## if(!length(unique(object$offset)) == 1) message("The smooth offset is fixed over all folds.")
class(object) <- class(object)[class(object) != "FDboostLSS"]
## set up grid according to defaults of cvrisk.nc_mboostLSS and cvrisk.mboostLSS
if(is.null(grid)){
if(any(class(object) == "nc_mboostLSS")){
grid <- 1:sum(mstop(object))
}else{
grid <- make.grid(mstop(object))
}
}
## call cvrisk.nc_mboostLSS or cvrisk.mboostLSS
ret <- cvrisk(object = object, folds = folds,
grid = grid,
papply = papply, trace = trace,
fun = fun, ...)
return(ret)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.