predict.boostmtree: Predict longitudinal trajectories from a fitted boostmtree...
In boostmtree: Boosted Multivariate Trees for Longitudinal Data

predict.boostmtree

R Documentation

Predict longitudinal trajectories from a fitted boostmtree model

Description

Generate fitted or predicted trajectories from a boostmtree fit. The method can return predictions for the original training subjects, for a new longitudinal test set, or for new subjects evaluated on a common time grid.

Usage

## S3 method for class 'boostmtree'
predict(
  object,
  x,
  tm,
  id,
  y,
  M = NULL,
  eps = 1e-5,
  use.cv.flag = FALSE,
  partial = FALSE,
  ...
)

Arguments

`object`	A fitted object returned by `boostmtree`.
`x`	New covariate values. Supply either one row per subject or a long-format data frame with one row per observation. If omitted, the training subjects stored in `object` are used.
`tm`	Observation times for the prediction data. When `x`, `tm`, and `id` are supplied together, predictions are made at the observed times in the new data. When `x` is supplied without `tm` and `id`, each row of `x` is treated as a new subject and predictions are returned on the training time grid.
`id`	Subject identifier for long-format prediction data.
`y`	Observed responses for the prediction data. When supplied with long-format prediction data, the method computes standardized test-set error summaries and, if `M = NULL`, chooses the stopping iteration by minimizing the test-set error path.
`M`	Optional fixed boosting iteration used for prediction. If `NULL` and `y` is supplied, the stopping iteration is selected by minimizing the test-set error path. If `NULL` and `y` is not supplied, the stored `m.opt` from the fitted object is used when available; otherwise the full fitted path is used.
`eps`	Tolerance used when selecting `m.opt` from the test-set error path.
`use.cv.flag`	Logical; should the prediction use the stored out-of-bag coefficient estimates from a model fit with `cv.flag = TRUE`? This option is intended for predictions on the original training subjects and is ignored for new prediction data.
`partial`	Logical; if `TRUE`, interpret `x` as one row per subject and `tm` as a common time grid at which each subject should be evaluated.
`...`	Currently ignored. Included for S3 compatibility.

Details

For longitudinal prediction on a new data set, supply x, tm, and id in long format, with the same subject-level covariates used in the training fit. The function collapses the covariates to one row per subject and then uses the stored terminal-node coefficient path to reconstruct the predicted trajectory.

There are three common use cases.

First, if x is omitted, the method returns fitted trajectories for the training subjects. This is useful for inspecting the stored fit or for producing training-set plots.

Second, if x is supplied without tm and id, each row of x is treated as a new subject and predictions are returned on the training time grid stored in object$time.unique. This is convenient when the user wants an entire fitted profile for new subjects.

Third, if partial = TRUE, x must contain one row per subject and tm supplies a common prediction grid. This returns fitted profiles on that user-specified grid. The method uses the fitted time basis from the training model, so predictions are meaningful when the supplied grid remains within or close to the observed training-time range.

For binary and nominal families, prediction proceeds through one-vs-reference submodels; for ordinal families, prediction proceeds through cumulative submodels followed by a monotonicity correction across thresholds. The returned mu component stores the predicted mean path for each boosted subproblem, and prob.class converts these to class probabilities for the original response scale.

When y is supplied for the prediction data, the method computes standardized RMSE along the prediction path. If M = NULL, the method selects m.opt by minimizing the test-set RMSE, using the same tolerance rule as in model fitting.

Value

An object of class c("boostmtree", "predict", ...) with components:

base.learner: Stored tree learners from the fitted object.
boost.obj: The fitted training object with large internal fitting components removed.
df.time.design: Number of columns in the time-design matrices.
err.rate: Standardized test-set error summaries. For single-response fits this is typically a matrix with columns "l1" and "l2"; for multi-subproblem fits it is stored by subproblem. NULL when prediction responses y are not supplied.
family: The fitted response family.
gamma: Stored terminal-node coefficient summaries used for prediction.
id: Long-format subject identifier corresponding to the supplied prediction data.
id.unique: Unique subject identifiers in subject order.
k: Number of terminal nodes requested during fitting.
m.opt: Selected stopping iteration for each boosted subproblem.
membership: Predicted terminal-node memberships for each subproblem and boosting iteration.
mu: Predicted mean trajectories at the time points requested. If x, tm, and id are supplied, then mu is evaluated at the supplied subject-specific times. If only x is supplied, then each new subject is predicted on the fitted training time grid, so mu is already a full profile on that grid. If partial = TRUE, then mu is evaluated on the user-supplied common grid tm. For continuous and binary families this is a subject-level list of predicted trajectories. For nominal and ordinal families this is indexed first by boosted subproblem and then by subject.
muhat: Predicted full profiles reconstructed on time.grid, where time.grid is the fitted training time grid used by the model.
n: Number of subjects in the prediction data.
n.q: Number of boosted subproblems.
ni: Number of observations or requested time points per subject.
nu: Boosting step size used by the fitted model.
nu.vec: Expanded step-size vector on the time-basis scale.
partial: Logical; whether prediction was requested with a common user-supplied time grid.
prob.class: Predicted class probabilities on the original response scale for non-continuous families; NULL for the continuous family.
prob.hat.class: Class probabilities over time for non-continuous families; NULL for the continuous family or when use.cv.flag = TRUE.
q.set: Threshold levels (ordinal) or non-reference levels (binary or nominal) defining the boosted subproblems.
q.total: Total number of response levels for non-continuous families.
rmse: Standardized test-set RMSE evaluated at m.opt when prediction responses y are supplied; otherwise NULL.
time: A list of observed or requested times for each subject.
time.design: Subject-specific time-design matrices used for prediction.
time.grid: The common grid used for prediction; typically the training time grid.
time.unique: Sorted unique times appearing in time.
use.cv.flag: Logical; whether out-of-bag coefficient estimates were used.
x: Prediction covariates with one row per subject.
x.var.names: Covariate names expected by the fitted model.
y: Observed prediction-set responses split by subject when supplied; otherwise NULL.
y.levels: Observed response levels from the training fit for non-continuous families. NA for the continuous family.
y.mean: Overall response mean used for standardization.
y.org: Prediction-set responses encoded at the boosted-subproblem level when y is supplied; otherwise NULL. For continuous and binary families this is a subject-level list. For nominal and ordinal families it is indexed first by boosted subproblem and then by subject.
y.reference: Reference response level used by the nominal family; NULL otherwise.
y.sd: Overall response standard deviation used for standardization.

Author(s)

Amol Pande, Udaya B. Kogalur and Hemant Ishwaran

References

Friedman J.H. (2001). Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29(5):1189–1232.

Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data. Machine Learning, 106(2):277–305.

Pande A., Ishwaran H., Blackstone E.H., Rajeswaran J., and Gillanov M. (2022). Application of gradient boosting in evaluating surgical ablation for atrial fibrillation. SN Computer Science, 3:466.

Pande A., Ishwaran H., and Blackstone E.H. (2022). Boosting for multivariate longitudinal responses. SN Computer Science, 3:186.

Examples

## -------------------------------------------------------------
## Continuous longitudinal prediction on a held-out test set.
## -------------------------------------------------------------
set.seed(31)
sim.obj <- simLong(n = 20, n.test = 10, n.time = 4, model = 1,
                   family = "continuous")
dta <- sim.obj$data.list
trn <- sim.obj$train.index

fit <- boostmtree(
  x = dta$features[trn, , drop = FALSE],
  tm = dta$time[trn],
  id = dta$id[trn],
  y = dta$y[trn],
  family = "continuous",
  M = 10,
  verbose = FALSE
)

pred.obj <- predict(
  fit,
  x = dta$features[-trn, , drop = FALSE],
  tm = dta$time[-trn],
  id = dta$id[-trn],
  y = dta$y[-trn]
)

print(pred.obj)

## -------------------------------------------------------------
## Predict full profiles for new subjects on the training time grid.
## -------------------------------------------------------------
new.subjects <- dta$features[trn, , drop = FALSE][1:3, ]
pred.obj <- predict(fit, x = new.subjects)
str(pred.obj$mu[[1]], max.level = 1)

## -------------------------------------------------------------
## Predict on a user-supplied common time grid.
## -------------------------------------------------------------
grid.time <- seq(min(dta$time[trn]), max(dta$time[trn]), length.out = 25)
pred.grid <- predict(
  fit,
  x = new.subjects,
  tm = grid.time,
  partial = TRUE
)

str(pred.grid$mu[[1]], max.level = 1)


## -------------------------------------------------------------
## Binary longitudinal prediction.
## -------------------------------------------------------------
set.seed(44)
sim.bin <- simLong(n = 25, n.test = 10, n.time = 4, model = 2,
                   family = "binary")
dta.bin <- sim.bin$data.list
trn.bin <- sim.bin$train.index

fit.bin <- boostmtree(
  x = dta.bin$features[trn.bin, , drop = FALSE],
  tm = dta.bin$time[trn.bin],
  id = dta.bin$id[trn.bin],
  y = dta.bin$y[trn.bin],
  family = "binary",
  M = 10,
  verbose = FALSE
)

pred.bin <- predict(
  fit.bin,
  x = dta.bin$features[-trn.bin, , drop = FALSE],
  tm = dta.bin$time[-trn.bin],
  id = dta.bin$id[-trn.bin],
  y = dta.bin$y[-trn.bin]
)

print(pred.bin)

boostmtree documentation built on April 10, 2026, 9:10 a.m.

boostmtree index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

boostmtree
Boosted Multivariate Trees for Longitudinal Data

predict.boostmtree: Predict longitudinal trajectories from a fitted boostmtree...
In boostmtree: Boosted Multivariate Trees for Longitudinal Data

Predict longitudinal trajectories from a fitted boostmtree model

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to predict.boostmtree in boostmtree...

R Package Documentation

Browse R Packages

We want your feedback!

boostmtree Boosted Multivariate Trees for Longitudinal Data

predict.boostmtree: Predict longitudinal trajectories from a fitted boostmtree... In boostmtree: Boosted Multivariate Trees for Longitudinal Data

Predict longitudinal trajectories from a fitted boostmtree model

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to predict.boostmtree in boostmtree...

R Package Documentation

Browse R Packages

We want your feedback!

boostmtree
Boosted Multivariate Trees for Longitudinal Data

predict.boostmtree: Predict longitudinal trajectories from a fitted boostmtree...
In boostmtree: Boosted Multivariate Trees for Longitudinal Data