partials: Compute univariate partial dependence for each resample

View source: R/partials.R

partialsR Documentation

Compute univariate partial dependence for each resample

Description

Compute the partial dependence functions (i.e. marginal effects) for each model in a resample.

Usage

partials(object, expl, ...)

Arguments

object

an object output by xgb_fit(), which contains a model column.

expl

a vector of explanatory variables to compute the partial dependence to.

...

passed to pdp::partial(). Arguments of particular relevance are:

  • grid.resolution : an integer giving the number of equally spaced points along continuous variables to compute the partial dependence at.

  • quantiles=TRUE and probs (a vector of probabilities with values in [0,1]), to compute the partial dependence at those quantiles of the continuous explanatory variables.

Details

For each variable in expl, some target values are picked for continuous variables (along a grid or quantiles typically, see the arguments passed via ...) and all levels are considered for categorical ones. For each target value of each target explanatory variable:

  1. the training data is modified so that the target variable is made constant, equal to its target value, everywhere; all other explanatory variables remain unchanged.

  2. the model predictions are computed for this new data set.

  3. the predicted values are averaged, this gives yhat : the average prediction of the model for this value of the target variable.

Value

The input object with a new column called partial containing a data.frame with columns:

  • variable: the variable whose dependence to is computed;

  • value: the value of the variable at which the model marginal effects are computed.

  • yhat: the average prediction of the model for this value.

See Also

Other partial dependence plots functions: plot_partials(), summarise_partials()

Examples

# fit a model on 5 bootstraps
m <- resample_boot(mtcars, 5) %>%
  xgb_fit(resp="mpg", expl=c("cyl", "hp", "qsec"),
    eta=0.1, max_depth=4, nrounds=20)
# assess variable importance
importance(m) %>% summarise_importance()

# compute the partial dependence to the two most relevant variables
m <- partials(m, expl=c("hp", "cyl"))
# and plot them for each resample
plot_partials(m, fns=NULL)
# do the same with a finer grid
m <- partials(m, expl=c("hp", "cyl"), grid.resolution=50)
plot_partials(m, fns=NULL)
# or along quantiles
m <- partials(m, expl=c("hp", "cyl"), quantiles=TRUE, probs=0:20/20)
plot_partials(m, fns=NULL)

# compute mean+/-sd among resamples
summarise_partials(m)
plot_partials(m)
# do the same with median+/-mad
summarise_partials(m, fns=list(location=median, spread=mad))
plot_partials(m, fns=list(location=median, spread=mad))

jiho/joml documentation built on Dec. 6, 2023, 5:50 a.m.