R/NGeDSboost.R

Defines functions componentwise_fit NGeDSboost

Documented in NGeDSboost

################################################################################
################################################################################
################################## NGeDSboost ##################################
################################################################################
################################################################################
#' @title Component-wise gradient boosting with NGeDS base-learners
#' @name NGeDSboost
#' @description
#' \code{NGeDSboost} performs component-wise gradient boosting (Bühlmann and Yu
#' (2003), Bühlmann and Hothorn (2007)) using normal GeD splines (i.e., fitted
#' with \code{\link{NGeDS}} function) as base-learners (see Dimitrova et al. (2024)).
#' @param formula a description of the structure of the model to be fitted,
#' including the dependent and independent variables. Unlike \code{\link{NGeDS}}
#' and \code{\link{GGeDS}}, the formula specified allows for multiple additive
#' GeD spline regression components (as well as linear components) to be
#' included (e.g., \code{Y ~ f(X1) + f(X2) + X3}).
#' See \code{\link[=formula.GeDS]{formula}} for further details.
#' @param data a data frame containing the variables referenced in the formula.
#' @param weights an optional vector of `prior weights' to be put on the
#' observations during the fitting process. It should be \code{NULL} or a
#' numeric vector of the same length as the response variable defined in the
#' formula.
#' @param normalize_data a logical that defines whether the data should be
#' normalized (standardized) before fitting the baseline linear model, i.e.,
#' before running the FGB algorithm. Normalizing the data involves scaling the
#' predictor variables to have a mean of 0 and a standard deviation of 1. This
#' process alters the scale and interpretation of the knots and coefficients
#' estimated. Default is equal to \code{FALSE}.
#' @param family determines the loss function to be optimized by the boosting
#' algorithm. In case \code{initial_learner = FALSE} it also determines the
#' corresponding empirical risk minimizer to be used as offset initial learner.
#' By default, it is set to \code{mboost::Gaussian()}. Users can specify any
#' \code{\link[mboost]{Family}} object from the \pkg{mboost} package.
#' @param initial_learner a logical value. If set to \code{TRUE}, the model's
#' initial learner will be a normal GeD spline. If set to FALSE, then the
#' initial predictor will consist of the empirical risk minimizer corresponding
#' to the specified family. Note that if \code{initial_learner = TRUE},
#' \code{family} must be \code{mboost::Gaussian()}.
#' @param int.knots_init optional parameter allowing the user to set a
#' maximum number of internal knots to be added by the initial GeDS learner in
#' case \code{initial_learner = TRUE}. Default is equal to \code{2L}.
#' @param min_iterations optional parameter to manually set a minimum number of
#' boosting iterations to be run. If not specified, it defaults to 0L.
#' @param max_iterations optional parameter to manually set the maximum number
#' of boosting iterations to be run. If not specified, it defaults to 100L.
#' This setting serves as a fallback when the stopping rule, based on
#' consecutive deviances and tuned by \code{phi_boost_exit} and \code{q_boost},
#' does not trigger an earlier termination (see Dimitrova et al. (2024)).
#' Therefore, users can increase/decrease the number of boosting iterations,
#' by increasing/decreasing the value \code{phi_boost_exit} and/or#
#' \code{q_boost}, or directly specify \code{max_iterations}.
#' @param shrinkage numeric parameter in the interval \eqn{[0,1]} defining the
#' step size or shrinkage parameter. This controls the size of the steps taken
#' in the direction of the gradient of the loss function. In other words, the
#' magnitude of the update each new iteration contributes to the final model.
#' Default is equal to \code{1}.
#' @param phi_boost_exit numeric parameter in the interval \eqn{[0,1]}
#' specifying the threshold for the boosting iterations stopping rule. Default
#' is equal to \code{0.995}.
#' @param q_boost numeric parameter which allows to fine-tune the boosting
#' iterations stopping rule, by default equal to \code{2L}.
#' @param beta numeric parameter in the interval \eqn{[0,1]} tuning the knot
#' placement in stage A of GeDS. Default is equal to \code{0.5}. See details in
#' \code{\link{NGeDS}}.
#' @param phi numeric parameter in the interval \eqn{[0,1]} specifying the
#' threshold for the stopping rule  (model selector) in stage A of GeDS.
#' Default is equal to \code{0.99}. See details in \code{\link{NGeDS}}.
#' @param internal_knots The maximum number of internal knots that can be added
#' by the GeDS base-learners in each boosting iteration, effectively setting the
#' value of \code{max.intknots} in \code{\link{NGeDS}} at each boosting
#' iteration. Default is \code{500L}.
#' @param q numeric parameter which allows to fine-tune the stopping rule of
#' stage A of GeDS, by default equal to \code{2L}. See details in
#' \code{\link{NGeDS}}.
#' @param higher_order a logical that defines whether to compute the higher
#' order fits (quadratic and cubic) after the FGB algorithm is run. Default is
#' \code{TRUE}.
#' 
#' @return \code{\link{GeDSboost-Class}} object, i.e. a list of items that
#' summarizes the main details of the fitted FGB-GeDS model. See
#' \code{\link{GeDSboost-Class}} for details. Some S3 methods are available in
#' order to make these objects tractable, such as
#' \code{\link[=coef.GeDSboost]{coef}}, \code{\link[=knots.GeDSboost]{knots}},
#' \code{\link[=print.GeDSboost]{print}} and
#' \code{\link[=predict.GeDSboost]{predict}}. Also variable importance measures
#' (\code{\link[=bl_imp.GeDSboost]{bl_imp}}) and improved plotting facilities
#' (\code{\link[=visualize_boosting.GeDSboost]{visualize_boosting}}).
#' 
#' @details
#' The  \code{NGeDSboost} function implements functional gradient boosting
#' algorithm for some pre-defined loss function, using linear GeD splines as
#' base learners. At each boosting iteration, the negative gradient vector is
#' fitted through the base procedure encapsulated within the \code{\link{NGeDS}}
#' function. The latter constructs a Geometrically Designed variable knots
#' spline regression model for a response having a Normal distribution. The FGB
#' algorithm yields a final linear fit. Higher order fits (quadratic and cubic)
#' are then computed by calculating the Schoenberg’s variation diminishing
#' spline (VDS) approximation of the linear fit.
#' 
#' On the one hand, \code{NGeDSboost} includes all the parameters of
#' \code{\link{NGeDS}}, which in this case tune the base-learner fit at each
#' boosting iteration. On the other hand, \code{NGeDSboost} includes some
#' additional parameters proper to the FGB procedure. We describe the main ones
#' as follows. 
#' 
#' First, \code{family} allows to specify the loss function and corresponding
#' risk function to be optimized by the boosting algorithm. If
#' \code{initial_learner = FALSE}, the initial learner employed will be the
#' empirical risk minimizer corresponding to the family chosen. If
#' \code{initial_learner = TRUE} then the initial learner will be an
#' \code{\link{NGeDS}} fit with maximum number of internal knots equal to
#' \code{int.knots_init}.
#' 
#' \code{shrinkage} tunes the step length/shrinkage parameter which helps to 
#' control the learning rate of the model. In other words, when a new base
#' learner is added to the ensemble, its contribution to the final prediction is
#' multiplied by the shrinkage parameter. The smaller \code{shrinkage} is, the
#' slower/more gradual the learning process will be, and viceversa.
#' 
#' The number of boosting iterations is controlled by a
#' \emph{Ratio of Deviances} stopping rule similar to the one presented for
#' \code{\link{GGeDS}}. In the same way \code{phi} and \code{q} tune the
#' stopping rule of \code{\link{GGeDS}}, \code{phi_boost_exit} and
#' \code{q_boost} tune the stopping rule of \code{NGeDSboost}. The user can also
#' manually control the number of boosting iterations through
#' \code{min_iterations} and \code{max_iterations}.
#' 
#' @examples
#' 
#' ################################# Example 1 #################################
#' # Generate a data sample for the response variable
#' # Y and the single covariate X
#' set.seed(123)
#' N <- 500
#' f_1 <- function(x) (10*x/(1+100*x^2))*4+4
#' X <- sort(runif(N, min = -2, max = 2))
#' # Specify a model for the mean of Y to include only a component
#' # non-linear in X, defined by the function f_1
#' means <- f_1(X)
#' # Add (Normal) noise to the mean of Y
#' Y <- rnorm(N, means, sd = 0.2)
#' data = data.frame(X, Y)
#'
#' # Fit a Normal FGB-GeDS regression using NGeDSboost
#'
#' Gmodboost <- NGeDSboost(Y ~ f(X), data = data)
#' MSE_Gmodboost_linear <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_linear)^2)
#' MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_quadratic)^2)
#' MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - Gmodboost$predictions$pred_cubic)^2)
#'
#' cat("\n", "MEAN SQUARED ERROR", "\n",
#'     "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
#'     "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
#'     "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
#' 
#' # Compute predictions on new randomly generated data
#' X <- sort(runif(100, min = -2, max = 2))
#' 
#' pred_linear <- predict(Gmodboost, newdata = data.frame(X), n = 2L)
#' pred_quadratic <- predict(Gmodboost, newdata = data.frame(X), n = 3L)
#' pred_cubic <- predict(Gmodboost, newdata = data.frame(X), n = 4L)
#' 
#' MSE_Gmodboost_linear <- mean((sapply(X, f_1) - pred_linear)^2)
#' MSE_Gmodboost_quadratic <- mean((sapply(X, f_1) - pred_quadratic)^2)
#' MSE_Gmodboost_cubic <- mean((sapply(X, f_1) - pred_cubic)^2)
#' cat("\n", "MEAN SQUARED ERROR", "\n",
#'     "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
#'     "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
#'     "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
#' 
#' ## S3 methods for class 'GeDSboost'
#' # Print 
#' print(Gmodboost)
#' # Knots
#' knots(Gmodboost, n = 2L)
#' knots(Gmodboost, n = 3L)
#' knots(Gmodboost, n = 4L)
#' # Coefficients
#' coef(Gmodboost, n = 2L)
#' coef(Gmodboost, n = 3L)
#' coef(Gmodboost, n = 4L)
#' # Deviances
#' deviance(Gmodboost, n = 2L)
#' deviance(Gmodboost, n = 3L)
#' deviance(Gmodboost, n = 4L)
#' 
#' ############################ Example 2 - Bodyfat ############################
#' library(TH.data)
#' data("bodyfat", package = "TH.data")
#' 
#' Gmodboost <- NGeDSboost(formula = DEXfat ~ age + f(hipcirc, waistcirc) + f(kneebreadth),
#' data = bodyfat)
#' 
#' MSE_Gmodboost_linear <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_linear)^2)
#' MSE_Gmodboost_quadratic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_quadratic)^2)
#' MSE_Gmodboost_cubic <- mean((bodyfat$DEXfat - Gmodboost$predictions$pred_cubic)^2)
#' # Comparison
#' cat("\n", "MSE", "\n",
#'     "Linear NGeDSboost:", MSE_Gmodboost_linear, "\n",
#'     "Quadratic NGeDSboost:", MSE_Gmodboost_quadratic, "\n",
#'     "Cubic NGeDSboost:", MSE_Gmodboost_cubic, "\n")
#'
#' @seealso \code{\link{NGeDS}}; \code{\link{GGeDS}}; \code{\link{GeDSboost-Class}};
#' S3 methods such as \code{\link{knots.GeDSboost}}; \code{\link{coef.GeDSboost}};
#' \code{\link{deviance.GeDSboost}}; \code{\link{predict.GeDSboost}}
#'      
#' @export
#' @import foreach
#' @import doParallel
#' @import doFuture
#' @import future
#' @import doRNG
#' @import TH.data
#' @importFrom parallel detectCores
#' @importFrom stats setNames
#' 
#' @references 
#' Friedman, J.H. (2001).
#' Greedy function approximation: A gradient boosting machine.
#' \emph{The Annals of Statistics}, \strong{29 (5)}, 1189--1232. \cr
#' DOI: \doi{10.1214/aos/1013203451}
#' 
#' Bühlmann P., Yu B. (2003).
#' Boosting With the L2 Loss.
#' \emph{Journal of the American Statistical Association},
#' \strong{98(462)}, 324–339.
#' \doi{10.1198/016214503000125}
#' 
#' Bühlmann P., Hothorn T. (2007).
#' Boosting Algorithms: Regularization, Prediction and Model Fitting.
#' \emph{Statistical Science}, \strong{22(4)}, 477 – 505. \cr
#' DOI: \doi{10.1214/07-STS242}
#' 
#' Kaishev, V.K., Dimitrova, D.S., Haberman, S. and Verrall, R.J. (2016).
#' Geometrically designed, variable knot regression splines.
#' \emph{Computational Statistics}, \strong{31}, 1079--1105. \cr
#' DOI: \doi{10.1007/s00180-015-0621-7}
#' 
#' Dimitrova, D. S., Kaishev, V. K., Lattuada, A. and Verrall, R. J.  (2023).
#' Geometrically designed variable knot splines in generalized (non-)linear
#' models.
#' \emph{Applied Mathematics and Computation}, \strong{436}. \cr
#' DOI: \doi{10.1016/j.amc.2022.127493}
#' 
#' Dimitrova, D. S., Guillen, E. S. and Kaishev, V. K.  (2024).
#' \pkg{GeDS}: An \proglang{R} Package for Regression, Generalized Additive
#' Models and Functional Gradient Boosting, based on Geometrically Designed
#' (GeD) Splines. \emph{Manuscript submitted for publication.}

##################
### NGeDSboost ###
##################
NGeDSboost <- function(formula, data, weights = NULL, normalize_data = FALSE,
                       family = mboost::Gaussian(), initial_learner = TRUE,
                       int.knots_init = 2L, min_iterations,
                       max_iterations, shrinkage = 1,
                       phi_boost_exit = 0.995, q_boost = 2L, beta = 0.5,
                       phi = 0.99, internal_knots = 500L, q = 2L,
                       higher_order = TRUE)
  {
  # Capture the function call
  extcall <- match.call()
  
  # Models list
  models <- list()
  
  # Formula
  read.formula <- read.formula.boost(formula, data)
  response <- read.formula$response
  predictors <- read.formula$predictors
  base_learners <- read.formula$base_learners
  
  # Eliminate indexes and keep only relevant variables
  rownames(data) <- NULL
  all_vars <- c(response, predictors)
  if(all(all_vars %in% names(data))) {
    data <- data[, all_vars]
  } else {
    stop("Error: Some of the variables are not found in the data.")
  }
  # Check for NA in each column and print a warning message if found
  na_vars <- apply(data, 2, function(x) any(is.na(x)))
  for(var in names(data)[na_vars]){
    warning(paste("variable", var, "contains missing values;\n",
                  "these observations are not used for fitting"))
  }
  # Eliminate rows containing NA values
  data <- na.omit(data)
  
  # Family arguments
  ngradient <- family@ngradient
  risk <- family@risk
  offset <- family@offset
  check_y_family <- family@check_y
  
  # Min/max iterations
  min_iterations <- validate_iterations(min_iterations, 0L, "min_iterations")
  max_iterations <- validate_iterations(max_iterations, 100L, "max_iterations")
  
  # Save arguments
  args <- list(
    "predictors" = data[predictors], 
    "base_learners"= base_learners,
    "family" = family,
    "initial_learner" = initial_learner, 
    "int.knots_init" = if(!is.null(initial_learner)){int.knots_init} else {NULL},
    "shrinkage" = shrinkage, 
    "normalize_data" = normalize_data
  )
  
  # Response variable check
  data[response] <- check_y_family(data[[response]])
  # Add 'response' as the first element of the list 'args'
  args <- c(list(response = data.frame(data[[response]])), args)
  names(args$response) <- response
  
  # Normalize data if necessary
  if (normalize_data == TRUE) {
    if (family@name != "Negative Binomial Likelihood (logit link)") {
      # Mean and SD of the original response and predictor(s) variables (to de-normalize afterwards)
      args$Y_mean <- mean(data[[response]])
      args$Y_sd <- sd(data[[response]])
      data[response] <- as.vector(scale(data[response]))
      
      numeric_predictors <- names(data[predictors])[sapply(data[predictors], is.numeric)]
      args$X_mean <- colMeans(data[numeric_predictors])
      args$X_sd <- sapply(data[numeric_predictors], sd)
      # Scale only numeric predictors
      data[numeric_predictors] <- scale(data[numeric_predictors])
      
      } else {
        # If the family is "Negative Binomial Likelihood (logit link)" only normalize predictors
        args$X_mean <- colMeans(data[predictors])
        args$X_sd <- sapply(data[predictors], sd)
        data[predictors] <- scale(data[predictors])
    }
  }
  
  # Weights
  nobs = length(data[[response]])
  if (is.null(weights)) weights <- rep.int(1, nobs)
  else weights <- rescale_weights(weights)
  if (!family@weights(weights))
    stop(sQuote("family"), " is not able to deal with weights")

  # Initialize the knots, intervals, and coefficients for each base-learner
  base_learners_list <- list()
  
  for (bl_name in names(base_learners)) {
    # Extract predictor variable(s) of base-learner
    pred_vars <- base_learners[[bl_name]]$variables
    ## (A) GeDS base-learners
    if (base_learners[[bl_name]]$type=="GeDS") {
      # Get min and max values for each predictor
      min_vals <- sapply(pred_vars, function(p) min(data[[p]]))
      max_vals <- sapply(pred_vars, function(p) max(data[[p]]))
      # (A.1) UNIVARIATE BASE-LEARNERS
      if (length(pred_vars) == 1) {
        knots <- c(min_vals[[1]], max_vals[[1]])
        intervals <- matrix(c(min_vals[[1]], max_vals[[1]]), ncol = 2)
        colnames(intervals) <- c("start", "end")
        coefficients <- list("b0" = 0, "b1" = 0)
        base_learners_list[[bl_name]] <- list("knots" = knots, "intervals" = intervals, "coefficients" = coefficients)
      # (A.2) BIVARIATE BASE-LEARNERS
        } else {
          knots <- list(Xk = c(min_vals[1], max_vals[1]), Yk = c(min_vals[2], max_vals[2]))
          base_learners_list[[bl_name]] <- list("knots" = knots, "iterations" = list())
        }
    ## (B) Linear base-learners (initialize only coefficients)
    } else if (base_learners[[bl_name]]$type=="linear"){
      # Factor
      if (is.factor(data[[pred_vars]])){
        n <- nlevels(data[[pred_vars]]); coefficients <- vector("list", length = n)
        names(coefficients) <- paste0("b", 0:(n-1)); coefficients[] <- 0
      # Continuous
        } else {
          coefficients <- list("b0" = 0, "b1" = 0)
        }
      base_learners_list[[bl_name]] <- list("coefficients" = coefficients)
    }
  }
  
  #####################
  ## Initial Learner ##
  #####################
  
  ## (I) GeDS (or Linear) Initial Learner ##
  if (initial_learner) {
    if (family@name != "Squared Error (Regression)") {
      stop("Initial learner only allowed for family = Gaussian()")
    }
    
    # Pre-allocate memory for results
    results <- vector("list", length = length(base_learners))
    names(results) <- names(base_learners)
    best_pred <- best_bl <- NULL; best_resid <- best_ssr <- Inf
    # Template for model formula
    model_formula_template <- paste0(response, " ~ ")
    
    ## 1. Loop through predictors and fit an initial base learner to data (NGeDS with # int.knots_init)
    model_results <- lapply(names(base_learners), function(bl_name) {
      componentwise_fit(bl_name, data = data, response = response, model_formula_template, weights, base_learners, m = 0,
                        internal_knots = int.knots_init, beta, phi, q)
      })
    
    ## 2. Calculate SSR and find model that fits best U according to SSR
    ssr_values <- numeric(length(model_results))
    for (i in seq_along(model_results)) {
      if (!model_results[[i]]$error) {
        ssr_values[i] <- model_results[[i]]$ssr
        } else {
          message(model_results[[i]]$message)
          ssr_values[i] <- Inf
        }
    }
    # Finding the best model
    min_ssr_index <- which.min(ssr_values)
    best_resid    <- model_results[[min_ssr_index]]$resid 
    best_ssr      <- model_results[[min_ssr_index]]$ssr
    best_pred     <- model_results[[min_ssr_index]]$pred
    best_bl       <- model_results[[min_ssr_index]]$bl_name
    
    ## (A) GeDS best base-learner
    if (base_learners[[best_bl]]$type == "GeDS") {
      
      # (A.1) UNIVARIATE BASE-LEARNERS
      if (length(base_learners[[best_bl]]$variables) == 1) {
        ## 3. GeDS fit into linear piecewise polynomial form
        pred_linear <- best_pred$Y_hat
        int.knt <- best_pred$int.knt
        b0 <- best_pred$b0
        b1 <- best_pred$b1
        coef <- list("b0" = b0, "b1" = b1, "mat" = best_pred$mat)
        ## 4. Knots, intervals and coefficients
        # (i) Knots
        knots = sort(c(base_learners_list[[best_bl]]$knots, int.knt))
        int.knots = knots[-c(1, length(knots))]
        # (ii) Intervals
        # Combine the 1st value from the "start" col of intervals, with the values from the "end" col of intervals and int.knt
        combined <- sort(c(unname(base_learners_list[[best_bl]]$intervals[1, "start"]),
                           unname(base_learners_list[[best_bl]]$intervals[, "end"]), int.knots))
        intervals <- matrix(c(combined[-length(combined)], combined[-1]), ncol = 2)
        colnames(intervals) <- c("start", "end")
        
        # Save the coefficients, knots and intervals for the best predictor
        base_learners_list[[best_bl]]$knots <- knots
        base_learners_list[[best_bl]]$intervals <- intervals
        base_learners_list[[best_bl]]$coefficients <- list("b0" = b0, "b1" = b1)
        
        # Number of int. knots placed on initial learner
        n_intknots <- length(knots) - 2
        
      # (A.2) BIVARIATE BASE-LEARNERS
        } else if (length(base_learners[[best_bl]]$variables) == 2){
          ## 3/4. Save knots and coefficients (bivariate base-learners are not converted into PP form)
          pred_linear <- best_pred$Y_hat
          int.knt <- best_pred$int.knt
          knots = list(Xk = sort(c(base_learners_list[[best_bl]]$knots$Xk, int.knt$Xint.knt)),
                       Yk = sort(c(base_learners_list[[best_bl]]$knots$Yk, int.knt$Yint.knt)))
          base_learners_list[[best_bl]]$knots <- knots
          # Number of int. knots placed on initial learner
          n_intknots <- list(X = length(knots$Xk) - 2, Y = length(knots$Yk) - 2)
          
          coef <-  best_pred$theta
          base_learners_list[[best_bl]]$iterations[["model0"]] <- list("int.knt" = int.knt, "coef" = coef)
        }
    
    ## (B) Linear best base-learner
      } else if (base_learners[[best_bl]]$type == "linear") {
        pred_linear <- best_pred$Y_hat
        n <- length(best_pred) - 1; coef <- vector("list", length = n)
        names(coef) <- paste0("b", 0:(n-1))
        coef[] <- best_pred[-1]
        
        int.knt <- n_intknots <- NULL
        
        base_learners_list[[best_bl]]$coefficients <-  coef
      }
    
    ## 3. Evaluate model and compute residuals
    Y_hat <- best_pred$Y_hat
    U     <- best_resid
    # Initialize old.dev for stopping rule
    old.dev <- best_ssr
    
    # Save model
    models[["model0"]] <- list("best_bl" = list("name" = best_bl, "n_intknots" = n_intknots, "int.knt" = int.knt,
                                                "coef" = coef, "pred_linear" = pred_linear),
                               "Y_hat" = Y_hat, "base_learners" = base_learners_list)
    
    
  ## (II) Offset Initial Learner ##
  } else {
    # 1. Initialize Y_hat with an offset value
    Y_hat <- offset(data[[response]], weights)
    if (length(Y_hat) == 1){Y_hat <- rep(Y_hat, NROW(data[[response]]))}
    old.dev <- risk(data[[response]], Y_hat, weights)
    # 2. Compute the negative gradient of the loss function
    U <- ngradient(data[[response]], Y_hat, weights)
    
    # Save model
    models[["model0"]] <- list("Y_hat" = Y_hat, "base_learners" = base_learners_list)
  }
  
  ## Initialize parallel processing if required (i.e. if # base-learners > 1000)
  pprocessing_threshold <- 1000
  if (length(base_learners) >= pprocessing_threshold) {
    # Number of cores
    n_cores <- detectCores() - 3 # Leave 3 cores free
    plan(multisession, workers = n_cores)
    registerDoFuture()
    # Number of predictors
    n_bl <- length(base_learners)
    # Number of predictors per batch
    bl_per_batch <- ceiling(n_bl / n_cores)
    # Split predictors into batches
    bl_batches <- split(names(base_learners), ceiling(seq_along(names(base_learners)) / bl_per_batch))
    # Call necessary internal functions
    componentwise_fit   <- componentwise_fit
    predict_GeDS_linear <- predict_GeDS_linear
    last_row_with_value <- last_row_with_value
  }
  
  #########################
  ## Boosting iterations ##
  #########################
  for (m in 1:max_iterations){
    
    previous_model <- models[[paste0("model", m-1)]]
    data_loop <- cbind(U = U, data[predictors])
    
    # Pre-allocate memory for results
    results <- vector("list", length = length(base_learners))
    names(results) <- names(base_learners)
    best_pred <- best_bl <- NULL; best_ssr <- Inf
    # Template for model formula
    model_formula_template <- "U ~ "
    
    if (length(base_learners) < pprocessing_threshold) {
      ## 3. Loop through base learners and fit to negative gradient (NGeDS with # internal_knots or linear model)
      model_results <- lapply(names(base_learners), function(bl_name) {
        componentwise_fit(bl_name, data = data_loop, response = "U", model_formula_template, weights, base_learners, m,
                          internal_knots = internal_knots, beta, phi, q)
      })
      
      ## 4. Calculate SSR and find model that fits best U according to SSR
      ssr_values <- numeric(length(model_results))
      for (i in seq_along(model_results)) {
        if (!model_results[[i]]$error) {
          ssr_values[i] <- model_results[[i]]$ssr
        } else {
          message(model_results[[i]]$message)
          ssr_values[i] <- Inf
        }
      }
      
      # Finding the best model
      min_ssr_index <- which.min(ssr_values)
      best_ssr      <- model_results[[min_ssr_index]]$ssr
      best_pred     <- model_results[[min_ssr_index]]$pred
      best_bl       <- model_results[[min_ssr_index]]$bl_name
      
    #########################
    ## PARALLEL PROCESSING ##
    #########################
      } else {
       ## 3. Loop through base-learners and fit to negative gradient (NGeDS with # internal_knots)
        results <- foreach(bl_name = names(base_learners), .combine = 'rbind', .packages = "GeDS",
                           .export = c("predict_GeDS_linear", "last_row_with_value")) %dorng% {
          componentwise_fit(bl_name, data = data_loop, response = "U", model_formula_template, weights, base_learners, m,
                            internal_knots = internal_knots, beta, phi, q)
        }
      ## 4. Calculate SSR and find model that fits best U according to SSR
        best_result_index <- which.min(sapply(1:nrow(results), function(i) results[i, "ssr"]))
        best_ssr <- unlist(results[best_result_index, "ssr"])
        best_pred <- results[best_result_index, "pred"][[1]]
        best_bl <- unlist(results[best_result_index, "bl_name"])
        
      }
    
    # If all base-learners were skipped, stop the iterations
    if (is.infinite(best_ssr)) {
      message(paste0("All predictors were skipped in iteration ", m, ". Stopping boosting iterations."))
      break
    }
    
    ## (A) GeDS best base-learner
    if (base_learners[[best_bl]]$type == "GeDS") {
      
      # (A.1) UNIVARIATE BASE-LEARNERS
      if (length(base_learners[[best_bl]]$variables) == 1){
        ## 4.1. GeDS fit into linear piecewise polynomial form
        pred_linear <- best_pred$Y_hat
        int.knt <- best_pred$int.knt
        b0 <- best_pred$b0
        b1 <- best_pred$b1
        coef <- list("b0" = b0, "b1" = b1, "mat" = best_pred$mat)
        
        ## 4.2. Knots, intervals and coefficients
        # (i) Update knots vector
        knots = sort(c(previous_model$base_learners[[best_bl]]$knots, int.knt))
        int.knots = knots[-c(1, length(knots))]
        
        # (ii) Update intervals matrix
        # Combine the 1st value from the "start" col of intervals, with the values from the "end" col of intervals and int.knt
        combined <- sort(c(unname(previous_model$base_learners[[best_bl]]$intervals[1, "start"]),
                           unname(previous_model$base_learners[[best_bl]]$intervals[, "end"]), int.knt))
        intervals <- matrix(c(combined[-length(combined)], combined[-1]), ncol = 2)
        colnames(intervals) <- c("start", "end")
        
        # (iii) Coefficients
        # Combine old intervals with old coefficients
        old_aux <- cbind(previous_model$base_learners[[best_bl]]$intervals,
                         data.frame(b0 = previous_model$base_learners[[best_bl]]$coefficients$b0),
                         data.frame(b1 = previous_model$base_learners[[best_bl]]$coefficients$b1))
        # Combine new intervals with coefficients from the piecewise polynomial
        new_aux <- data.frame(intervals)
        new_aux$interval <- findInterval(new_aux$end, c(-Inf, int.knt, Inf))
        new_aux$interval[new_aux$end %in% int.knt] <- new_aux$interval[new_aux$end %in% int.knt] - 1
        
        new_aux$b0 <- b0[new_aux$interval]
        new_aux$b1 <- b1[new_aux$interval]
        new_aux$interval <- NULL
        
        # Initialize new coefficients
        b0_new <- numeric(nrow(new_aux))
        b1_new <- numeric(nrow(new_aux))
        
        # Sum old coefficients and coefficients of piecewise polynomial
        # Using a matrix approach for faster computation
        mat_start <- matrix(old_aux$start, nrow = nrow(new_aux), ncol = nrow(old_aux), byrow = TRUE)
        mat_end <- matrix(old_aux$end, nrow = nrow(new_aux), ncol = nrow(old_aux), byrow = TRUE)
        
        # Find intervals that satisfy the conditions
        indices <- which(mat_start <= new_aux$start & new_aux$end <= mat_end, arr.ind = TRUE)
        
        # Update coefficients based on the identified intervals
        b0_new[indices[, 1]] <- old_aux$b0[indices[, 2]] + shrinkage * new_aux$b0[indices[, 1]]
        b1_new[indices[, 1]] <- old_aux$b1[indices[, 2]] + shrinkage * new_aux$b1[indices[, 1]]
        
        # Update the coefficients, knots, and intervals for the best predictor
        base_learners_list[[best_bl]]$knots <- knots
        base_learners_list[[best_bl]]$intervals <- intervals
        base_learners_list[[best_bl]]$coefficients <- list("b0" = b0_new, "b1" = b1_new)
        
        # Calculate number of internal knots placed on current iteration
        n_intknots <- length(knots) - length(previous_model$base_learners[[best_bl]]$knots)
        
      # (A.2) BIVARIATE BASE-LEARNERS
        } else if (length(base_learners[[best_bl]]$variables) == 2){
          ## 3/4. Save knots and coefficients (bivariate base-learners are not converted into PP form)
          pred_linear <- best_pred$Y_hat
          int.knt <- best_pred$int.knt
          knots = list(Xk = sort(c(base_learners_list[[best_bl]]$knots$Xk, int.knt$Xint.knt)),
                       Yk = sort(c(base_learners_list[[best_bl]]$knots$Yk, int.knt$Yint.knt)))
          base_learners_list[[best_bl]]$knots <- knots
          # Number of int. knots placed on current iteration
          n_intknots <- list(X = length(knots$Xk) - length(previous_model$base_learners[[best_bl]]$knots$Xk),
                             Y = length(knots$Yk) - length(previous_model$base_learners[[best_bl]]$knots$Yk))
          coef <-  best_pred$theta
          base_learners_list[[best_bl]]$iterations[[paste0("model", m)]] <- list("int.knt" = int.knt, "coef" = coef)
        }
      
    ## (B) Linear best base-learner
      } else if (base_learners[[best_bl]]$type == "linear") {
        pred_linear <- best_pred$Y_hat
        n <- length(best_pred) - 1; coef <- vector("list", length = n)
        names(coef) <- paste0("b", 0:(n-1))
        coef[] <- best_pred[-1]
        
        int.knt <- n_intknots <- NULL
        
        base_learners_list[[best_bl]]$coefficients <- mapply(function(old_coef, new_coef) old_coef + shrinkage * new_coef,
                                                             previous_model$base_learners[[best_bl]]$coefficients,
                                                             coef, SIMPLIFY=FALSE)
      }
    
    
    ## 5. Evaluate model and recompute gradient vector and residuals
    Y_hat <- models[[paste0("model", m-1)]]$Y_hat + shrinkage*best_pred$Y_hat
    U <- ngradient(data[[response]], Y_hat, weights)
    new.dev <- risk(data[[response]], Y_hat, weights)
    
    # Save model
    models[[paste0("model", m)]] <- list("best_bl" = list("name" = best_bl, "n_intknots" = n_intknots, 
                                                          "int.knt"= int.knt, "coef"= coef, "pred_linear" = pred_linear),
                                         "Y_hat" = Y_hat, "base_learners" = base_learners_list)
    
    
    ## 6. Stopping Rule (Ratio of Deviances)
    # Update the previous dev with current dev
    old.dev <- c(old.dev, new.dev)
   
    if (!is.null(phi_boost_exit) && !is.null(q_boost) && m >= q_boost && m >= min_iterations) {
      # Check if the change in Deviance is less than the tolerance level
      phi_boost =  old.dev[m+1]/ old.dev[m+1-q_boost]
      if (phi_boost >= phi_boost_exit) {
        cat("Stopping iterations due to small improvement in Deviance\n\n")
        break
      }
    }
  }
  
  
  ## 7. Set the "final model" to be the one with lower deviance
  # 7.1. De-normalize predictions if necessary
  if (normalize_data == TRUE && family@name != "Negative Binomial Likelihood (logit link)") {
    models <- lapply(models, function(model) {
      model$Y_hat <- model$Y_hat * args$Y_sd + args$Y_mean
      return(model)
      })
  }
  # 7.2. Calculate each model's deviance
  family_stats <- get_mboost_family(family@name)
  deviances <- sapply(models, function(model) {
    mu <- family@response(model$Y_hat)
    if(family_stats$family == "binomial" && all(args$response[[response]] %in% c(-1, 1))) {
      # since for NGeDSboost(family = "binomial") the encoding is -1/1 and for stats::binomial() the encoding is 0/1
      sum(family_stats$dev.resids((args$response[[response]] + 1) / 2, mu, weights))
      } else {
        sum(family_stats$dev.resids(args$response[[response]], mu, weights))
        }
    })
  # 7.3. Find the model with the smallest deviance
  model_min_deviance <- names(models)[which.min(deviances)]
  final_model <- list(
    model_name = model_min_deviance,
    DEV = min(deviances),
    Y_hat = models[[model_min_deviance]]$Y_hat,      
    base_learners = models[[model_min_deviance]]$base_learners
  )
  # Save linear internal knots for each base-learner
  for (bl_name in names(base_learners)) {
    final_model$base_learners[[bl_name]]$linear.int.knots <- get_internal_knots(final_model$base_learners[[bl_name]]$knots)
  }
  
  #######################
  ## Higher order fits ##
  #######################
  if (higher_order) {
    
    # Extract variables of GeDS/linear base-learners
    GeDS_variables <- lapply(base_learners, function(x) {if(x$type == "GeDS") return(x$variables) else return(NULL)})
    GeDS_variables <- unname(unlist(GeDS_variables))
    linear_variables <- lapply(base_learners, function(x) {if(x$type == "linear") return(x$variables) else return(NULL)})
    linear_variables <- unname(unlist(linear_variables))
    
    # Quadratic fit
    qq_list <- compute_avg_int.knots(final_model, base_learners = base_learners,
                                   args$X_sd, args$X_mean, normalize_data, n = 3)
    quadratic_fit <- tryCatch({
      SplineReg_Multivar(X = args$predictors[GeDS_variables], Y = args$response[[response]],
                         Z = args$predictors[linear_variables],
                         base_learners = args$base_learners, InterKnotsList = qq_list,
                         n = 3, family = get_mboost_family(family@name))}, error = function(e) {
                           cat(paste0("Error computing quadratic fit:", e))
                           return(NULL)
                           })
    final_model$Quadratic.Fit <- quadratic_fit
    pred_quadratic <- as.numeric(quadratic_fit$Predicted)
    
    # Cubic fit
    cc_list <- compute_avg_int.knots(final_model, base_learners = base_learners,
                                   args$X_sd, args$X_mean, normalize_data, n = 4)
    cubic_fit <- tryCatch({
      SplineReg_Multivar(X = args$predictors[GeDS_variables], Y = args$response[[response]],
                         Z = args$predictors[linear_variables],
                         base_learners = args$base_learners, InterKnotsList = cc_list,
                         n = 4, family = get_mboost_family(family@name))}, error = function(e) {
                           cat(paste0("Error computing cubic fit:", e))
                           return(NULL)
                           })
    final_model$Cubic.Fit <- cubic_fit
    pred_cubic <- as.numeric(cubic_fit$Predicted)
    
    # Save quadratic and cubic knots for each base-learner
    for (bl_name in names(base_learners)){
      final_model$base_learners[[bl_name]]$quadratic.int.knots <- qq_list[[bl_name]]
      final_model$base_learners[[bl_name]]$cubic.int.knots <- cc_list[[bl_name]]
    }
    
    } else {
      pred_quadratic <-  pred_cubic <- NULL
      final_model$base_learners[[bl_name]]$quadratic.int.knots <- NULL
      final_model$base_learners[[bl_name]]$cubic.int.knots <- NULL
    }
  
  # Simplify final_model structure
  for (bl_name in names(base_learners)){
    final_model$base_learners[[bl_name]]$knots <- NULL
  }
  
  # Response (for example, if family is binomial, define the probability estimates)
  f <- as.numeric(final_model$Y_hat); pred_linear <- family@response(f)
  
  preds <- list(pred_linear = pred_linear, pred_quadratic = pred_quadratic, pred_cubic = pred_cubic)
  
  # Store internal knots
  linear.int.knots <- setNames(vector("list", length(base_learners)), names(base_learners))
  quadratic.int.knots <- setNames(vector("list", length(base_learners)), names(base_learners))
  cubic.int.knots <- setNames(vector("list", length(base_learners)), names(base_learners))
  # Loop through each base learner and extract the int.knots
  for(bl_name in names(base_learners)){
    # Extract and store linear internal knots
    linear_int.knt <- final_model$base_learners[[bl_name]]$linear.int.knots
    linear.int.knots[bl_name] <- list(linear_int.knt)
    # Extract and store quadratic internal knots
    quadratic_int.knt <- final_model$base_learners[[bl_name]]$quadratic.int.knots
    quadratic.int.knots[bl_name] <- list(quadratic_int.knt)
    # Extract and store cubic internal knots
    cubic_int.knt <- final_model$base_learners[[bl_name]]$cubic.int.knots
    cubic.int.knots[bl_name] <- list(cubic_int.knt)
  }
  # Combine the extracted knots into a single list
  internal_knots <- list(linear.int.knots = linear.int.knots, quadratic.int.knots = quadratic.int.knots,
                         cubic.int.knots = cubic.int.knots)
  
  output <- list(extcall = extcall, formula = formula, args = args, models = models, final_model = final_model, predictions = preds,
                 internal_knots = internal_knots, iters = m)
  class(output) <- "GeDSboost"
  return(output)
}


#######################################
### Component-wise fitting function ###
#######################################
componentwise_fit <- function(bl_name, data, response, model_formula_template, weights, base_learners, m, 
                              internal_knots, beta, phi, q) {
  
  # Initialize a results list with default values
  results <- list(
    bl_name = bl_name,
    resid = Inf,
    ssr = Inf,
    pred = NULL,
    error = FALSE,
    message = NULL
  )
  
  pred_vars <- base_learners[[bl_name]]$variables
  
  ## (A) GeDS base-learners
  if (base_learners[[bl_name]]$type == "GeDS") {
    max.intknots <- if (length(pred_vars) == 1) internal_knots
    else if (length(pred_vars) == 2 && internal_knots == 0) stop("internal_knots must be > 0 for bivariate learners")
    else internal_knots
    
    model_formula <- formula(paste0(model_formula_template, bl_name))
    error <- FALSE
    suppressWarnings({
      fit <- tryCatch(
        NGeDS(model_formula, data = data, weights = weights, beta = beta, phi = phi,
              min.intknots = 0, max.intknots = max.intknots, q = q, Xextr = NULL, Yextr = NULL,
              show.iters = FALSE, stoptype = "RD", higher_order = FALSE),
        error = function(e) {
          message(paste0("Error occurred in NGeDS() for base-learner ", bl_name, ": ", e))
          error <<- TRUE
        }
      )
    })
    if (error) {
      results$error <- TRUE
      results$message <- paste0("Skipped iteration ", m, " for base-learner ", bl_name, ".")
      return(results)
    }
    
    # Compute predictions
    pred <- if(length(pred_vars) == 1) {
      predict_GeDS_linear(fit, data[[pred_vars]])
    } else if(length(pred_vars) == 2) {
      predict_GeDS_linear(fit, X = data[pred_vars[1]], Y = data[pred_vars[2]], Z = data[[response]])
    }
    
    ## (B) Linear base-learners
  } else if (base_learners[[bl_name]]$type == "linear") {
    model_formula <- formula(paste0(model_formula_template, bl_name))
    error <- FALSE
    suppressWarnings({
      fit <- tryCatch(
        lm(model_formula, data = data),
        error = function(e) {
          message(paste0("Error occurred in lm() for base-learner ", bl_name, ": ", e))
          error <<- TRUE
        }
      )
    })
    if (error) {
      results$error <- TRUE
      results$message <- paste0("Skipped iteration ", m, " for base-learner ", bl_name, ".")
      return(results)
    }
    
    # Compute predictions
    pred <- list(Y_hat = fit$fitted.values)
    # Create coefficient entries
    for (i in 1:length(fit$coefficients)) {
      coef_name <- paste0("b", i - 1)
      pred[[coef_name]] <- fit$coefficients[[i]]
    }
  }
  
  # Calculate SSR
  resid <- data[[response]] - pred$Y_hat
  ssr <- sum((resid)^2)
  
  # Save results
  results$resid <- resid
  results$ssr <- ssr
  results$pred <- pred
  
  return(results)
}
alattuada/GeDS documentation built on April 26, 2024, 11:36 a.m.