View source: R/gg_partial_rfsrc.R
| gg_partial_rfsrc | R Documentation |
A partial dependence curve marginalizes the forest's prediction over all other predictors: for each evaluation point of the target variable, the forest scores every training observation with that value substituted in, then averages the result. What you get is the average effect of the target variable after "integrating out" the rest – a curve that would be flat if the variable carried no signal.
gg_partial_rfsrc(
rf_model,
xvar.names = NULL,
xvar2.name = NULL,
newx = NULL,
partial.time = NULL,
partial.type = c("surv", "chf", "mort"),
cat_limit = 10,
n_eval = 25
)
rf_model |
A fitted |
xvar.names |
Character vector of predictor names for which partial
dependence should be computed. Must be a subset of |
xvar2.name |
Optional single character name of a grouping variable in
|
newx |
Optional |
partial.time |
Numeric vector of desired time points for survival
forests (ignored for regression/classification). Values are automatically
snapped to the nearest entry in |
partial.type |
Character; type of predicted value for survival
forests, passed through to |
cat_limit |
Variables with fewer than |
n_eval |
Number of evaluation points for continuous variables. Instead of passing all observed values (which can be slow, especially for survival forests), continuous predictors are evaluated on a quantile grid of this many points. Categorical variables always use all unique levels. Defaults to 25. |
This function builds those curves for one or more predictors by calling
partial.rfsrc and then tidy-stacking the
results into separate data frames for continuous and categorical variables.
Unlike gg_partial (which wraps plot.variable), you
pass the fitted rfsrc object directly – no intermediate
plot.variable step.
For survival forests, the marginalized quantity depends on
partial.type: survival probability ("surv"), cumulative
hazard function ("chf"), or expected mortality ("mort").
You can request the curve at one or more time horizons via
partial.time; the resulting data have a time column so the
plot layers them as separate coloured lines.
A named list with two elements:
A data.frame with columns x (numeric),
yhat, name (variable name), and optionally grp
(the level of xvar2.name) and time (survival forests
only) for all continuous predictors.
A data.frame with the same columns but
x kept as character, for low-cardinality predictors.
partial.timepartial.rfsrc expects every value in
partial.time to be an exact member of the model's
time.interest vector, the unique observed event times stored in the
fitted object. Pass an arbitrary time, even a plausible one such as
c(1, 3) for a study measured in years, and you get a C-level
prediction error from inside partial.rfsrc.
gg_partial_rfsrc takes care of this: every element of
partial.time is silently snapped to its nearest
time.interest value before the call. To target a specific
follow-up horizon, find the closest grid point yourself and pass it
explicitly:
ti <- rf_model$time.interest t1 <- ti[which.min(abs(ti - 1))] # nearest to 1 year pd <- gg_partial_rfsrc(rf_model, xvar.names = "x", partial.time = t1)
partial.rfsrc does not handle
logical predictor columns correctly in survival forests
(randomForestSRC <= 3.5.1). If your training data contains binary 0/1
columns, convert them to factor rather than logical
before fitting the model.
gg_partial, partial.rfsrc,
get.partial.plot.data
## ------------------------------------------------------------
##
## regression
##
## ------------------------------------------------------------
airq.obj <- randomForestSRC::rfsrc(Ozone ~ ., data = airquality)
## partial effect for wind
prt_dta <- gg_partial_rfsrc(airq.obj,
xvar.names = c("Wind"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.