predict.boostmtree: Predict longitudinal trajectories from a fitted boostmtree...

View source: R/predict.boostmtree.R

predict.boostmtreeR Documentation

Predict longitudinal trajectories from a fitted boostmtree model

Description

Generate fitted or predicted trajectories from a boostmtree fit. The method can return predictions for the original training subjects, for a new longitudinal test set, or for new subjects evaluated on a common time grid.

Usage

## S3 method for class 'boostmtree'
predict(
  object,
  x,
  tm,
  id,
  y,
  M = NULL,
  eps = 1e-5,
  use.cv.flag = FALSE,
  partial = FALSE,
  ...
)

Arguments

object

A fitted object returned by boostmtree.

x

New covariate values. Supply either one row per subject or a long-format data frame with one row per observation. If omitted, the training subjects stored in object are used.

tm

Observation times for the prediction data. When x, tm, and id are supplied together, predictions are made at the observed times in the new data. When x is supplied without tm and id, each row of x is treated as a new subject and predictions are returned on the training time grid.

id

Subject identifier for long-format prediction data.

y

Observed responses for the prediction data. When supplied with long-format prediction data, the method computes standardized test-set error summaries and, if M = NULL, chooses the stopping iteration by minimizing the test-set error path.

M

Optional fixed boosting iteration used for prediction. If NULL and y is supplied, the stopping iteration is selected by minimizing the test-set error path. If NULL and y is not supplied, the stored m.opt from the fitted object is used when available; otherwise the full fitted path is used.

eps

Tolerance used when selecting m.opt from the test-set error path.

use.cv.flag

Logical; should the prediction use the stored out-of-bag coefficient estimates from a model fit with cv.flag = TRUE? This option is intended for predictions on the original training subjects and is ignored for new prediction data.

partial

Logical; if TRUE, interpret x as one row per subject and tm as a common time grid at which each subject should be evaluated.

...

Currently ignored. Included for S3 compatibility.

Details

For longitudinal prediction on a new data set, supply x, tm, and id in long format, with the same subject-level covariates used in the training fit. The function collapses the covariates to one row per subject and then uses the stored terminal-node coefficient path to reconstruct the predicted trajectory.

There are three common use cases.

First, if x is omitted, the method returns fitted trajectories for the training subjects. This is useful for inspecting the stored fit or for producing training-set plots.

Second, if x is supplied without tm and id, each row of x is treated as a new subject and predictions are returned on the training time grid stored in object$time.unique. This is convenient when the user wants an entire fitted profile for new subjects.

Third, if partial = TRUE, x must contain one row per subject and tm supplies a common prediction grid. This returns fitted profiles on that user-specified grid. The method uses the fitted time basis from the training model, so predictions are meaningful when the supplied grid remains within or close to the observed training-time range.

For binary and nominal families, prediction proceeds through one-vs-reference submodels; for ordinal families, prediction proceeds through cumulative submodels followed by a monotonicity correction across thresholds. The returned mu component stores the predicted mean path for each boosted subproblem, and prob.class converts these to class probabilities for the original response scale.

When y is supplied for the prediction data, the method computes standardized RMSE along the prediction path. If M = NULL, the method selects m.opt by minimizing the test-set RMSE, using the same tolerance rule as in model fitting.

Value

An object of class c("boostmtree", "predict", ...) with components:

base.learner

Stored tree learners from the fitted object.

boost.obj

The fitted training object with large internal fitting components removed.

df.time.design

Number of columns in the time-design matrices.

err.rate

Standardized test-set error summaries. For single-response fits this is typically a matrix with columns "l1" and "l2"; for multi-subproblem fits it is stored by subproblem. NULL when prediction responses y are not supplied.

family

The fitted response family.

gamma

Stored terminal-node coefficient summaries used for prediction.

id

Long-format subject identifier corresponding to the supplied prediction data.

id.unique

Unique subject identifiers in subject order.

k

Number of terminal nodes requested during fitting.

m.opt

Selected stopping iteration for each boosted subproblem.

membership

Predicted terminal-node memberships for each subproblem and boosting iteration.

mu

Predicted mean trajectories at the time points requested. If x, tm, and id are supplied, then mu is evaluated at the supplied subject-specific times. If only x is supplied, then each new subject is predicted on the fitted training time grid, so mu is already a full profile on that grid. If partial = TRUE, then mu is evaluated on the user-supplied common grid tm. For continuous and binary families this is a subject-level list of predicted trajectories. For nominal and ordinal families this is indexed first by boosted subproblem and then by subject.

muhat

Predicted full profiles reconstructed on time.grid, where time.grid is the fitted training time grid used by the model.

n

Number of subjects in the prediction data.

n.q

Number of boosted subproblems.

ni

Number of observations or requested time points per subject.

nu

Boosting step size used by the fitted model.

nu.vec

Expanded step-size vector on the time-basis scale.

partial

Logical; whether prediction was requested with a common user-supplied time grid.

prob.class

Predicted class probabilities on the original response scale for non-continuous families; NULL for the continuous family.

prob.hat.class

Class probabilities over time for non-continuous families; NULL for the continuous family or when use.cv.flag = TRUE.

q.set

Threshold levels (ordinal) or non-reference levels (binary or nominal) defining the boosted subproblems.

q.total

Total number of response levels for non-continuous families.

rmse

Standardized test-set RMSE evaluated at m.opt when prediction responses y are supplied; otherwise NULL.

time

A list of observed or requested times for each subject.

time.design

Subject-specific time-design matrices used for prediction.

time.grid

The common grid used for prediction; typically the training time grid.

time.unique

Sorted unique times appearing in time.

use.cv.flag

Logical; whether out-of-bag coefficient estimates were used.

x

Prediction covariates with one row per subject.

x.var.names

Covariate names expected by the fitted model.

y

Observed prediction-set responses split by subject when supplied; otherwise NULL.

y.levels

Observed response levels from the training fit for non-continuous families. NA for the continuous family.

y.mean

Overall response mean used for standardization.

y.org

Prediction-set responses encoded at the boosted-subproblem level when y is supplied; otherwise NULL. For continuous and binary families this is a subject-level list. For nominal and ordinal families it is indexed first by boosted subproblem and then by subject.

y.reference

Reference response level used by the nominal family; NULL otherwise.

y.sd

Overall response standard deviation used for standardization.

Author(s)

Amol Pande, Udaya B. Kogalur and Hemant Ishwaran

References

Friedman J.H. (2001). Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29(5):1189–1232.

Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data. Machine Learning, 106(2):277–305.

Pande A., Ishwaran H., Blackstone E.H., Rajeswaran J., and Gillanov M. (2022). Application of gradient boosting in evaluating surgical ablation for atrial fibrillation. SN Computer Science, 3:466.

Pande A., Ishwaran H., and Blackstone E.H. (2022). Boosting for multivariate longitudinal responses. SN Computer Science, 3:186.

See Also

boostmtree, partial.plot.boostmtree, plot.boostmtree, print.boostmtree, simLong, spirometry, vimp.boostmtree

Examples

## -------------------------------------------------------------
## Continuous longitudinal prediction on a held-out test set.
## -------------------------------------------------------------
set.seed(31)
sim.obj <- simLong(n = 20, n.test = 10, n.time = 4, model = 1,
                   family = "continuous")
dta <- sim.obj$data.list
trn <- sim.obj$train.index

fit <- boostmtree(
  x = dta$features[trn, , drop = FALSE],
  tm = dta$time[trn],
  id = dta$id[trn],
  y = dta$y[trn],
  family = "continuous",
  M = 10,
  verbose = FALSE
)

pred.obj <- predict(
  fit,
  x = dta$features[-trn, , drop = FALSE],
  tm = dta$time[-trn],
  id = dta$id[-trn],
  y = dta$y[-trn]
)

print(pred.obj)

## -------------------------------------------------------------
## Predict full profiles for new subjects on the training time grid.
## -------------------------------------------------------------
new.subjects <- dta$features[trn, , drop = FALSE][1:3, ]
pred.obj <- predict(fit, x = new.subjects)
str(pred.obj$mu[[1]], max.level = 1)

## -------------------------------------------------------------
## Predict on a user-supplied common time grid.
## -------------------------------------------------------------
grid.time <- seq(min(dta$time[trn]), max(dta$time[trn]), length.out = 25)
pred.grid <- predict(
  fit,
  x = new.subjects,
  tm = grid.time,
  partial = TRUE
)

str(pred.grid$mu[[1]], max.level = 1)


## -------------------------------------------------------------
## Binary longitudinal prediction.
## -------------------------------------------------------------
set.seed(44)
sim.bin <- simLong(n = 25, n.test = 10, n.time = 4, model = 2,
                   family = "binary")
dta.bin <- sim.bin$data.list
trn.bin <- sim.bin$train.index

fit.bin <- boostmtree(
  x = dta.bin$features[trn.bin, , drop = FALSE],
  tm = dta.bin$time[trn.bin],
  id = dta.bin$id[trn.bin],
  y = dta.bin$y[trn.bin],
  family = "binary",
  M = 10,
  verbose = FALSE
)

pred.bin <- predict(
  fit.bin,
  x = dta.bin$features[-trn.bin, , drop = FALSE],
  tm = dta.bin$time[-trn.bin],
  id = dta.bin$id[-trn.bin],
  y = dta.bin$y[-trn.bin]
)

print(pred.bin)


boostmtree documentation built on April 10, 2026, 9:10 a.m.