View source: R/predict.boostmtree.R
| predict.boostmtree | R Documentation |
Generate fitted or predicted trajectories from a boostmtree fit.
The method can return predictions for the original training subjects,
for a new longitudinal test set, or for new subjects evaluated on a
common time grid.
## S3 method for class 'boostmtree'
predict(
object,
x,
tm,
id,
y,
M = NULL,
eps = 1e-5,
use.cv.flag = FALSE,
partial = FALSE,
...
)
object |
A fitted object returned by |
x |
New covariate values. Supply either one row per subject or a
long-format data frame with one row per observation. If omitted, the
training subjects stored in |
tm |
Observation times for the prediction data. When |
id |
Subject identifier for long-format prediction data. |
y |
Observed responses for the prediction data. When supplied with
long-format prediction data, the method computes standardized test-set
error summaries and, if |
M |
Optional fixed boosting iteration used for prediction. If
|
eps |
Tolerance used when selecting |
use.cv.flag |
Logical; should the prediction use the stored
out-of-bag coefficient estimates from a model fit with
|
partial |
Logical; if |
... |
Currently ignored. Included for S3 compatibility. |
For longitudinal prediction on a new data set, supply x, tm,
and id in long format, with the same subject-level covariates used in
the training fit. The function collapses the covariates to one row per subject
and then uses the stored terminal-node coefficient path to reconstruct the
predicted trajectory.
There are three common use cases.
First, if x is omitted, the method returns fitted trajectories for the
training subjects. This is useful for inspecting the stored fit or for
producing training-set plots.
Second, if x is supplied without tm and id, each row of
x is treated as a new subject and predictions are returned on the
training time grid stored in object$time.unique. This is convenient
when the user wants an entire fitted profile for new subjects.
Third, if partial = TRUE, x must contain one row per subject and
tm supplies a common prediction grid. This returns fitted profiles on
that user-specified grid. The method uses the fitted time basis from the
training model, so predictions are meaningful when the supplied grid remains
within or close to the observed training-time range.
For binary and nominal families, prediction proceeds through one-vs-reference
submodels; for ordinal families, prediction proceeds through cumulative
submodels followed by a monotonicity correction across thresholds. The returned
mu component stores the predicted mean path for each boosted subproblem,
and prob.class converts these to class probabilities for the original
response scale.
When y is supplied for the prediction data, the method computes
standardized RMSE along the prediction path. If M = NULL, the
method selects m.opt by minimizing the test-set RMSE, using the
same tolerance rule as in model fitting.
An object of class c("boostmtree", "predict", ...) with components:
Stored tree learners from the fitted object.
The fitted training object with large internal fitting components removed.
Number of columns in the time-design matrices.
Standardized test-set error summaries. For single-response
fits this is typically a matrix with columns "l1" and "l2";
for multi-subproblem fits it is stored by subproblem. NULL when
prediction responses y are not supplied.
The fitted response family.
Stored terminal-node coefficient summaries used for prediction.
Long-format subject identifier corresponding to the supplied prediction data.
Unique subject identifiers in subject order.
Number of terminal nodes requested during fitting.
Selected stopping iteration for each boosted subproblem.
Predicted terminal-node memberships for each subproblem and boosting iteration.
Predicted mean trajectories at the time points requested.
If x, tm, and id are supplied, then mu
is evaluated at the supplied subject-specific times. If only
x is supplied, then each new subject is predicted on the
fitted training time grid, so mu is already a full profile
on that grid. If partial = TRUE, then mu is evaluated
on the user-supplied common grid tm. For continuous and
binary families this is a subject-level list of predicted
trajectories. For nominal and ordinal families this is indexed
first by boosted subproblem and then by subject.
Predicted full profiles reconstructed on
time.grid, where time.grid is the fitted training time
grid used by the model.
Number of subjects in the prediction data.
Number of boosted subproblems.
Number of observations or requested time points per subject.
Boosting step size used by the fitted model.
Expanded step-size vector on the time-basis scale.
Logical; whether prediction was requested with a common user-supplied time grid.
Predicted class probabilities on the original response
scale for non-continuous families; NULL for the continuous family.
Class probabilities over time for non-continuous
families; NULL for the continuous family or when
use.cv.flag = TRUE.
Threshold levels (ordinal) or non-reference levels (binary or nominal) defining the boosted subproblems.
Total number of response levels for non-continuous families.
Standardized test-set RMSE evaluated at m.opt when
prediction responses y are supplied; otherwise NULL.
A list of observed or requested times for each subject.
Subject-specific time-design matrices used for prediction.
The common grid used for prediction; typically the training time grid.
Sorted unique times appearing in time.
Logical; whether out-of-bag coefficient estimates were used.
Prediction covariates with one row per subject.
Covariate names expected by the fitted model.
Observed prediction-set responses split by subject when
supplied; otherwise NULL.
Observed response levels from the training fit for
non-continuous families. NA for the continuous family.
Overall response mean used for standardization.
Prediction-set responses encoded at the boosted-subproblem
level when y is supplied; otherwise NULL. For continuous and
binary families this is a subject-level list. For nominal and ordinal
families it is indexed first by boosted subproblem and then by subject.
Reference response level used by the nominal family;
NULL otherwise.
Overall response standard deviation used for standardization.
Amol Pande, Udaya B. Kogalur and Hemant Ishwaran
Friedman J.H. (2001). Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 29(5):1189–1232.
Pande A., Li L., Rajeswaran J., Ehrlinger J., Kogalur U.B., Blackstone E.H., Ishwaran H. (2017). Boosted multivariate trees for longitudinal data. Machine Learning, 106(2):277–305.
Pande A., Ishwaran H., Blackstone E.H., Rajeswaran J., and Gillanov M. (2022). Application of gradient boosting in evaluating surgical ablation for atrial fibrillation. SN Computer Science, 3:466.
Pande A., Ishwaran H., and Blackstone E.H. (2022). Boosting for multivariate longitudinal responses. SN Computer Science, 3:186.
boostmtree,
partial.plot.boostmtree,
plot.boostmtree,
print.boostmtree,
simLong,
spirometry,
vimp.boostmtree
## -------------------------------------------------------------
## Continuous longitudinal prediction on a held-out test set.
## -------------------------------------------------------------
set.seed(31)
sim.obj <- simLong(n = 20, n.test = 10, n.time = 4, model = 1,
family = "continuous")
dta <- sim.obj$data.list
trn <- sim.obj$train.index
fit <- boostmtree(
x = dta$features[trn, , drop = FALSE],
tm = dta$time[trn],
id = dta$id[trn],
y = dta$y[trn],
family = "continuous",
M = 10,
verbose = FALSE
)
pred.obj <- predict(
fit,
x = dta$features[-trn, , drop = FALSE],
tm = dta$time[-trn],
id = dta$id[-trn],
y = dta$y[-trn]
)
print(pred.obj)
## -------------------------------------------------------------
## Predict full profiles for new subjects on the training time grid.
## -------------------------------------------------------------
new.subjects <- dta$features[trn, , drop = FALSE][1:3, ]
pred.obj <- predict(fit, x = new.subjects)
str(pred.obj$mu[[1]], max.level = 1)
## -------------------------------------------------------------
## Predict on a user-supplied common time grid.
## -------------------------------------------------------------
grid.time <- seq(min(dta$time[trn]), max(dta$time[trn]), length.out = 25)
pred.grid <- predict(
fit,
x = new.subjects,
tm = grid.time,
partial = TRUE
)
str(pred.grid$mu[[1]], max.level = 1)
## -------------------------------------------------------------
## Binary longitudinal prediction.
## -------------------------------------------------------------
set.seed(44)
sim.bin <- simLong(n = 25, n.test = 10, n.time = 4, model = 2,
family = "binary")
dta.bin <- sim.bin$data.list
trn.bin <- sim.bin$train.index
fit.bin <- boostmtree(
x = dta.bin$features[trn.bin, , drop = FALSE],
tm = dta.bin$time[trn.bin],
id = dta.bin$id[trn.bin],
y = dta.bin$y[trn.bin],
family = "binary",
M = 10,
verbose = FALSE
)
pred.bin <- predict(
fit.bin,
x = dta.bin$features[-trn.bin, , drop = FALSE],
tm = dta.bin$time[-trn.bin],
id = dta.bin$id[-trn.bin],
y = dta.bin$y[-trn.bin]
)
print(pred.bin)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.