orsf_pd_oob | R Documentation |
Compute partial dependence for an ORSF model. Partial dependence (PD) shows the expected prediction from a model as a function of a single predictor or multiple predictors. The expectation is marginalized over the values of all other predictors, giving something like a multivariable adjusted estimate of the model's prediction. You can compute partial dependence three ways using a random forest:
using in-bag predictions for the training data
using out-of-bag predictions for the training data
using predictions for a new set of data
See examples for more details
orsf_pd_oob(
object,
pred_spec,
pred_horizon = NULL,
pred_type = "risk",
expand_grid = TRUE,
prob_values = c(0.025, 0.5, 0.975),
prob_labels = c("lwr", "medn", "upr"),
boundary_checks = TRUE,
n_thread = 1,
...
)
orsf_pd_inb(
object,
pred_spec,
pred_horizon = NULL,
pred_type = "risk",
expand_grid = TRUE,
prob_values = c(0.025, 0.5, 0.975),
prob_labels = c("lwr", "medn", "upr"),
boundary_checks = TRUE,
n_thread = 1,
...
)
orsf_pd_new(
object,
pred_spec,
new_data,
pred_horizon = NULL,
pred_type = "risk",
na_action = "fail",
expand_grid = TRUE,
prob_values = c(0.025, 0.5, 0.975),
prob_labels = c("lwr", "medn", "upr"),
boundary_checks = TRUE,
n_thread = 1,
...
)
object |
(orsf_fit) a trained oblique random survival forest (see orsf). |
pred_spec |
(named list or data.frame).
|
pred_horizon |
(double) a value or vector indicating the time(s)
that predictions will be calibrated to. E.g., if you were predicting
risk of incident heart failure within the next 10 years, then
|
pred_type |
(character) the type of predictions to compute. Valid options are
|
expand_grid |
(logical) if |
prob_values |
(numeric) a vector of values between 0 and 1,
indicating what quantiles will be used to summarize the partial
dependence values at each set of inputs. |
prob_labels |
(character) a vector of labels with the same length
as |
boundary_checks |
(logical) if |
n_thread |
(integer) number of threads to use while computing predictions. Default is one thread. To use the maximum number of threads that your system provides for concurrent execution, set |
... |
Further arguments passed to or from other methods (not currently used). |
new_data |
a data.frame, tibble, or data.table to compute predictions in. |
na_action |
(character) what should happen when
|
Partial dependence has a number of known limitations and assumptions that users should be aware of (see Hooker, 2021). In particular, partial dependence is less intuitive when >2 predictors are examined jointly, and it is assumed that the feature(s) for which the partial dependence is computed are not correlated with other features (this is likely not true in many cases). Accumulated local effect plots can be used (see here) in the case where feature independence is not a valid assumption.
a data.table containing partial dependence values for the specified variable(s) at the specified prediction horizon(s).
Begin by fitting an ORSF ensemble:
library(aorsf) set.seed(329730) index_train <- sample(nrow(pbc_orsf), 150) pbc_orsf_train <- pbc_orsf[index_train, ] pbc_orsf_test <- pbc_orsf[-index_train, ] fit <- orsf(data = pbc_orsf_train, formula = Surv(time, status) ~ . - id, oobag_pred_horizon = 365.25 * 5)
You can compute partial dependence and ICE three ways with aorsf
:
using in-bag predictions for the training data
pd_train <- orsf_pd_inb(fit, pred_spec = list(bili = 1:5)) pd_train
## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 101.9466 10.53944 51.65470 387.5041 ## 2: 1826.25 2 118.2382 16.90238 65.95072 400.6156 ## 3: 1826.25 3 138.9013 27.34446 91.45440 408.3768 ## 4: 1826.25 4 163.9056 46.18300 121.82058 417.1405 ## 5: 1826.25 5 181.6854 62.99029 140.65257 418.7087
using out-of-bag predictions for the training data
pd_train <- orsf_pd_oob(fit, pred_spec = list(bili = 1:5)) pd_train
## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 37.66151 3.607834 20.75476 137.8659 ## 2: 1826.25 2 43.56664 6.655814 25.94357 143.4564 ## 3: 1826.25 3 51.04030 10.102343 33.73989 146.7145 ## 4: 1826.25 4 60.28418 17.159042 43.95543 148.7279 ## 5: 1826.25 5 66.74464 22.974170 53.00352 149.4068
using predictions for a new set of data
pd_test <- orsf_pd_new(fit, new_data = pbc_orsf_test, pred_spec = list(bili = 1:5)) pd_test
## pred_horizon bili mean lwr medn upr ## 1: 1826.25 1 121.9552 10.86471 88.9915 402.0936 ## 2: 1826.25 2 137.8436 19.81224 107.7018 411.1170 ## 3: 1826.25 3 159.1454 31.76190 134.2937 418.7824 ## 4: 1826.25 4 184.4209 52.09751 162.6736 427.0102 ## 5: 1826.25 5 202.1922 69.21315 179.9189 428.3857
in-bag partial dependence indicates relationships that the model has learned during training. This is helpful if your goal is to interpret the model.
out-of-bag partial dependence indicates relationships that the model has learned during training but using the out-of-bag data simulates application of the model to new data. if you want to test your model’s reliability or fairness in new data but you don’t have access to a large testing set.
new data partial dependence shows how the model predicts outcomes for observations it has not seen. This is helpful if you want to test your model’s reliability or fairness.
Giles Hooker, Lucas Mentch, Siyu Zhou. Unrestricted Permutation forces Extrapolation: Variable Importance Requires at least One More Model, or There Is No Free Variable Importance. arXiv e-prints 2021 Oct; arXiv-1905. URL: https://doi.org/10.48550/arXiv.1905.03151
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.