Nothing
#' fit a robust predictive model with iteratively reweighted boosting algorithm
#'
#' Fit a predictive model with the iteratively reweighted convex optimization (IRCO) that minimizes the robust loss functions in the CC-family (concave-convex). The convex optimization is conducted by functional descent boosting algorithm in the R package \pkg{xgboost}. The iteratively reweighted boosting (IRBoost) algorithm reduces the weight of the observation that leads to a large loss; it also provides weights to help identify outliers. Applications include the robust generalized
#' linear models and extensions, where the mean is related to the predictors by boosting, and robust accelerated failure time models. \code{irb.train} is an advanced interface for training an irboost model. The \code{irboost} function is a simpler wrapper for \code{irb.train}. See \code{xgboost::xgb.train}.
#' @param data training dataset. \code{irb.train} accepts only an \code{xgboost::xgb.DMatrix} as the input. \code{irboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file. See \code{xgboost::xgb.train}.
#' @param params the list of parameters, \code{params} is passed to function xgboost::xgb.train which requires the same argument. The list must include \code{objective}, a convex component in the CC-family, the second C, or convex down. It is the same as \code{objective} in the \code{xgboost::xgb.train}. The following objective functions are currently implemented:
#' \itemize{
#' \item \code{reg:squarederror} Regression with squared loss.
#' \item \code{binary:logitraw} logistic regression for binary classification, predict linear predictor, not probabilies.
#' \item \code{binary:hinge} hinge loss for binary classification. This makes predictions of -1 or 1, rather than producing probabilities.
#' \item \code{multi:softprob} softmax loss function for multiclass problems. The result contains predicted probabilities of each data point in each class, say p_k, k=0, ..., nclass-1. Note, \code{label} is coded as in [0, ..., nclass-1]. The loss function cross-entropy for the i-th observation is computed as -log(p_k) with k=lable_i, i=1, ..., n.
#' \item \code{count:poisson}: Poisson regression for count data, predict mean of poisson distribution.
#' \item \code{reg:gamma}: gamma regression with log-link, predict mean of gamma distribution. The implementation in \code{xgboost::xgb.train} takes a parameterization in the exponential family:\cr
#' xgboost/src/src/metric/elementwise_metric.cu.\cr
#' In particularly, there is only one parameter psi and set to 1. The implementation of the IRCO algorithm follows this parameterization. See Table 2.1, McCullagh and Nelder, Generalized linear models, Chapman & Hall, 1989, second edition.
#' \item \code{reg:tweedie}: Tweedie regression with log-link. See also \cr
#' \code{tweedie_variance_power} in range: (1,2). A value close to 2 is like a gamma distribution. A value close to 1 is like a Poisson distribution.
#' \item \code{survival:aft}: Accelerated failure time model for censored survival time data. \code{irb.train} invokes \code{irb.train_aft}.
#'}
#' @param z_init vector of nobs with initial convex component values, must be non-negative with default values = weights if data has provided, otherwise z_init = vector of 1s
#' @param cfun concave component of CC-family, can be \code{"hacve", "acave", "bcave", "ccave"},
#' \code{"dcave", "ecave", "gcave", "hcave"}.\cr
#' See Table 2 https://arxiv.org/pdf/2010.02848.pdf
#' @param s tuning parameter of \code{cfun}. \code{s > 0} and can be equal to 0 for \code{cfun="tcave"}. If \code{s} is too close to 0 for \code{cfun="acave", "bcave", "ccave"}, the calculated weights can become 0 for all observations, thus crash the program
#' @param delta a small positive number provided by user only if \code{cfun="gcave"} and \code{0 < s <1}
#' @param iter number of iteration in the IRCO algorithm
#' @param nrounds boosting iterations within each IRCO iteration
#' @param del convergency criteria in the IRCO algorithm, no relation to \code{delta}
#' @param trace if \code{TRUE}, fitting progress is reported
#' @param ... other arguments passing to \code{xgb.train}
#' @importFrom stats predict
#' @importFrom xgboost xgb.train
#' @importFrom mpath compute_wt compute_g loss3 y2num y2num4glm
#' @return An object with S3 class \code{xgb.train} with the additional elments:
#' \itemize{
#' \item \code{weight_update_log} a matrix of \code{nobs} row by \code{iter} column of observation weights in each iteration of the IRCO algorithm
#' \item \code{weight_update} a vector of observation weights in the last IRCO iteration that produces the final model fit
#' \item\code{loss_log} sum of loss value of the composite function in each IRCO iteration. Note, \code{cfun} requires \code{objective} non-negative in some cases. Thus care must be taken. For instance, with \code{objective="reg:gamma"}, the loss value is defined by gamma-nloglik - (1+log(min(y))), where y=label. The second term is introduced such that the loss value is non-negative. In fact, gamma-nloglik=y/ypre + log(ypre) in the \code{xgboost::xgb.train}, where ypre is the mean prediction value, can
#' be negative. It can be derived that for fixed \code{y}, the minimum value of gamma-nloglik is achived at ypre=y, or 1+log(y). Thus, among all \code{label} values, the minimum of gamma-nloglik is 1+log(min(y)).
#'}
#' @author Zhu Wang\cr Maintainer: Zhu Wang \email{zwang145@uthsc.edu}
#' @references Wang, Zhu (2021), \emph{Unified Robust Boosting}, Journal of Data Science (2024), 1-19, DOI 10.6339/24-JDS1138
#' @keywords regression classification
#' @export irb.train
#' @examples
#' \dontrun{
#' Sys.setenv(
#' OMP_NUM_THREADS = "1",
#' OMP_THREAD_LIMIT = "1",
#' OPENBLAS_NUM_THREADS = "1",
#' MKL_NUM_THREADS = "1",
#' VECLIB_MAXIMUM_THREADS = "1",
#' BLIS_NUM_THREADS = "1"
#' )
#' # logistic boosting
#' data(agaricus.train, package='xgboost')
#' data(agaricus.test, package='xgboost')
#'
#' dtrain <- with(agaricus.train, xgboost::xgb.DMatrix(data, label = label, nthread = 1))
#' dtest <- with(agaricus.test, xgboost::xgb.DMatrix(data, label = label, nthread = 1))
#' watchlist <- list(train = dtrain, eval = dtest)
#'
#' # A simple irb.train example:
#' param <- list(max_depth = 2, eta = 1, nthread = 1,
#' objective = "binary:logitraw", eval_metric = "auc")
#' bst <- xgboost::xgb.train(params=param, data=dtrain, nrounds = 2,
#' watchlist=watchlist, verbose=2)
#' bst <- irb.train(params=param, data=dtrain, nrounds = 2)
#' summary(bst$weight_update)
#' # a bug in xgboost::xgb.train
#' #bst <- irb.train(params=param, data=dtrain, nrounds = 2,
#' # watchlist=watchlist, trace=TRUE, verbose=2)
#'
#' # time-to-event analysis
#' X <- matrix(1:5, ncol=1)
#' # Associate ranged labels with the data matrix.
#' # This example shows each kind of censored labels.
#' # uncensored right left interval
#' y_lower = c(10, 15, -Inf, 30, 100)
#' y_upper = c(Inf, Inf, 20, 50, Inf)
#' dtrain <- xgboost::xgb.DMatrix(data=X, label_lower_bound=y_lower,
#' label_upper_bound=y_upper)
#' param <- list(objective="survival:aft", nthread=1, aft_loss_distribution="normal",
#' aft_loss_distribution_scale=1, max_depth=3, min_child_weight=0)
#' watchlist <- list(train = dtrain)
#' bst <- xgboost::xgb.train(params=param, data=dtrain, nrounds=15,
#' watchlist=watchlist)
#' predict(bst, dtrain)
#' bst_cc <- irb.train(params=param, data=dtrain, nrounds=15, cfun="hcave",
#' s=1.5, trace=TRUE, verbose=0)
#' bst_cc$weight_update
#' }
irb.train <- function(params = list(), data, z_init=NULL, cfun="ccave", s=1, delta=0.1, iter=10, nrounds=100, del=1e-10, trace=FALSE, ...){
call <- match.call()
dfun <- params$objective
if (is.null(params$nthread)) params$nthread <- 1L
if(dfun=="survival:aft"){
return(irb.train_aft(params=params, data=data, cfun=cfun, s=s, delta=delta, iter=iter, nrounds=nrounds, del=del, trace=trace, ...))
}
if(!dfun %in% c("reg:squarederror", "binary:logitraw", "binary:hinge", "multi:softprob", "count:poisson", "reg:gamma", "reg:tweedie"))
stop("dfun not implemented/applicable")
#x <- data
y <- getinfo(data, "label")
if(dfun %in% c("reg:gamma") && any(y <= 0))
stop("response variable y must be positive for dfun ", dfun)
if(dfun %in% c("binary:logitraw", "binary:hinge")){
ynew <- eval(parse(text="mpath::y2num(y)"))
y <- eval(parse(text="mpath::y2num4glm(y)"))
}else
ynew <- y
if(!is.null(z_init)){
if(length(z_init)!=length(y))
stop("z_init must be the same length of response variable y")
if(any(z_init < 0))
stop("z_init must be non-negative")
}
cfunval <- eval(parse(text="mpath::cfun2num(cfun)"))
#what if dfun is not defined, such as gamma? it is worth to updating mpath
dfunval <- switch(dfun,
"reg:squarederror"=1,
"binary:logitraw"=5,
"binary:hinge"=6,
"count:poisson"=8,
"reg:gamma"=NULL,
"reg:tweedie"=NULL,
"multi:softprob"=NULL,
"survival:aft"=NULL)
eval(parse(text="mpath::check_s(cfun, ifelse(is.null(dfunval), 1, dfunval), s)"))
d <- 10
k <- 1
if(trace) {
cat("\nrobust boosting ...\n")
}
loss_log <- rep(NA, iter)
n <- length(y)
weight_update_log <- matrix(NA, nrow=n, ncol=iter)
weights <- getinfo(data, "weight")
if(is.null(weights)) weights <- rep(1, n)
if(is.null(z_init)) ylos <- weights else ylos <- z_init #initial values
if(dfun=="reg:gamma")
min_nloglik <- 1+log(min(y)) #the minimum value of negative log-likelihood value for a fixed y vector
while(d > del && k <= iter){
if(trace) cat("\niteration", k, "nrounds", nrounds, "\n")
if(k==1) weight_update <- weights else
weight_update <- mpath::compute_wt(ylos, weights, cfunval, s, delta)
weight_update_log[,k] <- weight_update
#if(trace) cat("weight_update", weight_update, "\n")
setinfo(data, 'weight', weight_update) #update data weight
RET <- xgboost::xgb.train(params = params, data=data, nrounds=nrounds, ...)
ypre <- predict(RET, newdata=data) #depends on objective, this is probability or response or linear predictor
#update loss values
if(dfun=="reg:squarederror"){
ylos <- (ynew - ypre)^2/2
}else if(dfun=="binary:logitraw"){
#u <- 1/(1+exp(-ypre)) # for y in [0, 1]
#ylos <- -y*log(u/(1-u)) - log(1-u)
ylos <- log(1 + exp( - ynew * ypre)) # for y in [-1, 1], the results ylos should be the same
}else if(dfun=="binary:hinge"){
ylos <- pmax(0, 1- ynew * ypre)
}else if(dfun=="multi:softprob"){
num_class <- RET$params$num_class
# xgboost changed the ordering of multi:softprob predictions across versions.
# Reshape defensively so each row corresponds to one observation's class probs.
ypre <- .reshape_softprob(ypre, n = n, num_class = num_class)
ylos <- rep(NA, n)
for(i in 1:n)
ylos[i] = - log(ypre[i, y[i]+1]) # label y is coded as in [0, num_class-1]
}else if(dfun %in% c("count:poisson")){
mu <- pmax(ypre, .Machine$double.eps)
ylos <- loss3(ynew, mu=mu, theta=1, weights, cfunval, family=3, s, delta)$z
}else if(dfun %in% c("reg:gamma")){
ylos <- y/ypre+log(ypre) #negative log-likelihood value with "parameter"=1 in xgboost
ylos <- ylos - min_nloglik #to shift the values to non-negative
}else if(dfun %in% c("reg:tweedie")){
rho <- params$tweedie_variance_power
if(is.null(rho)) rho <- 1.5
mu <- pmax(ypre, .Machine$double.eps)
a <- y * exp((1-rho)*log(mu))/(1-rho)
b <- exp((2-rho)*log(mu))/(2-rho)
ylos <- - a + b
}
loss_log[k] <- sum(mpath::compute_g(ylos, cfunval, s, delta))
if(k > 1){
d <- abs((loss_log[k-1]-loss_log[k]))/loss_log[k-1]
if(loss_log[k] > loss_log[k-1])
nrounds <- nrounds + 100
}
if(trace) cat("loss=", loss_log[k], "d=", d, "\n")
k <- k + 1
}
out <- list(
model = RET,
params = params,
y = y,
call = call,
weight_update_log = weight_update_log,
weight_update = weight_update,
loss_log = loss_log
)
class(out) <- "irboost_model"
out
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.