generatePartialDependenceData: Generate partial dependence.
In guillermozbta/s2: Machine Learning in R

Description Usage Arguments Value References See Also Examples

View source: R/generatePartialDependence.R

Estimate how the learned prediction function is affected by one or more features. For a learned function f(x) where x is partitioned into x_s and x_c, the partial dependence of f on x_s can be summarized by averaging over x_c and setting x_s to a range of values of interest, estimating E_(x_c)(f(x_s, x_c)). The conditional expectation of f at observation i is estimated similarly. Additionally, partial derivatives of the marginalized function w.r.t. the features can be computed.

generatePartialDependenceData(obj, input, features, interaction = FALSE,
  derivative = FALSE, individual = FALSE, center = NULL, fun = mean,
  bounds = c(qnorm(0.025), qnorm(0.975)), resample = "none", fmin, fmax,
  gridsize = 10L, range = NULL, ...)

`obj`	[`WrappedModel`] Result of `train`.
`input`	[`data.frame` \| `Task`] Input data.
`features`	[`character`] A vector of feature names contained in the training data. If not specified all features in the `input` will be used.
`interaction`	[`logical(1)`] Whether the `features` should be interacted or not. If `TRUE` then the Cartesian product of the prediction grid for each feature is taken, and the partial dependence at each unique combination of values of the features is estimated. Note that if the length of `features` is greater than two, `plotPartialDependence` and `plotPartialDependenceGGVIS` cannot be used. If `FALSE` each feature is considered separately. In this case `features` can be much longer than two. Default is `FALSE`.
`derivative`	[`logical(1)`] Whether or not the partial derivative of the learned function with respect to the features should be estimated. If `TRUE` `interaction` must be `FALSE`. The partial derivative of individual observations may be estimated. Note that computation time increases as the learned prediction function is evaluated at `gridsize` points * the number of points required to estimate the partial derivative. Additional arguments may be passed to `grad` (for regression or survival tasks) or `jacobian` (for classification tasks). Note that functions which are not smooth may result in estimated derivatives of 0 (for points where the function does not change within +/- epsilon) or estimates trending towards +/- infinity (at discontinuities). Default is `FALSE`.
`individual`	[`logical(1)`] Whether to plot the individual conditional expectation curves rather than the aggregated curve, i.e., rather than aggregating (using `fun`) the partial dependences of `features`, plot the partial dependences of all observations in `data` across all values of the `features`. The algorithm is developed in Goldstein, Kapelner, Bleich, and Pitkin (2015). Default is `FALSE`.
`center`	[`list`] A named list containing the fixed values of the `features` used to calculate an individual partial dependence which is then subtracted from each individual partial dependence made across the prediction grid created for the `features`: centering the individual partial dependence lines to make them more interpretable. This argument is ignored if `individual != TRUE`. Default is `NULL`.
`fun`	[`function`] For regression, a function that accepts a numeric vector and returns either a single number such as a measure of location such as the mean, or three numbers, which give a lower bound, a measure of location, and an upper bound. Note if three numbers are returned they must be in this order. For classification with `predict.type = "prob"` the function must accept a numeric matrix with the number of columns equal to the number of class levels of the target. For classification with `predict.type = "response"` (the default) the function must accept a character vector and output a numeric vector with length equal to the number of classes in the target feature. Two variables, `data` and `newdata` are made available to `fun` internally via a wrapper. 'data' is the training data from 'input' and 'newdata' contains a single point from the prediction grid for `features` along with the training data for features not in `features`. This allows the computation of weights based on comparisons of the prediction grid to the training data. The default is the mean, unless `obj` is classification with `predict.type = "response"` in which case the default is the proportion of observations predicted to be in each class.
`bounds`	[`numeric(2)`] The value (lower, upper) the estimated standard error is multiplied by to estimate the bound on a confidence region for a partial dependence. Ignored if `predict.type != "se"` for the learner. Default is the 2.5 and 97.5 quantiles (-1.96, 1.96) of the Gaussian distribution.
`resample`	[`character(1)`] Defines how the prediction grid for each feature is created. If “bootstrap” then values are sampled with replacement from the training data. If “subsample” then values are sampled without replacement from the training data. If “none” an evenly spaced grid between either the empirical minimum and maximum, or the minimum and maximum defined by `fmin` and `fmax`, is created. Default is “none”.
`fmin`	[`numeric`] The minimum value that each element of `features` can take. This argument is only applicable if `resample = NULL` and when the empirical minimum is higher than the theoretical minimum for a given feature. This only applies to numeric features and a `NA` should be inserted into the vector if the corresponding feature is a factor. Default is the empirical minimum of each numeric feature and NA for factor features.
`fmax`	[`numeric`] The maximum value that each element of `features` can take. This argument is only applicable if `resample = "none"` and when the empirical maximum is lower than the theoretical maximum for a given feature. This only applies to numeric features and a `NA` should be inserted into the vector if the corresponding feature is a factor. Default is the empirical maximum of each numeric feature and NA for factor features.
`gridsize`	[`integer(1)`] The length of the prediction grid created for each feature. If `resample = "bootstrap"` or `resample = "subsample"` then this defines the number of (possibly non-unique) values resampled. If `resample = NULL` it defines the length of the evenly spaced grid created.
`range`	[`list`] The range of values of the feature you would want the partial plots on - passed as a numeric list
`...`	additional arguments to be passed to `predict`.

[PartialDependenceData]. A named list, which contains the partial dependence, input data, target, features, task description, and other arguments controlling the type of partial dependences made.

Object members:

`data`	[`data.frame`] Has columns for the prediction: one column for regression and survival analysis, and a column for class and the predicted probability for classification as well as a a column for each element of `features`. If `individual = TRUE` then there is an additional column `idx` which gives the index of the `data` that each prediction corresponds to.
`task.desc`	[`TaskDesc`] Task description.
`target`	Target feature for regression, target feature levels for classification, survival and event indicator for survival.
`features`	[`character`] Features argument input.
`interaction`	[`logical(1)`] Whether or not the features were interacted (i.e. conditioning).
`derivative`	[`logical(1)`] Whether or not the partial derivative was estimated.
`individual`	[`logical(1)`] Whether the partial dependences were aggregated or the individual curves are retained.
`center`	[`logical(1)`] If `individual == TRUE` whether the partial dependence at the values of the features specified was subtracted from the individual partial dependences. Only displayed if `individual == TRUE`.

Goldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. “Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation.” Journal of Computational and Graphical Statistics. Vol. 24, No. 1 (2015): 44-65.

Friedman, Jerome. “Greedy Function Approximation: A Gradient Boosting Machine.” The Annals of Statistics. Vol. 29. No. 5 (2001): 1189-1232.

Other partial_dependence: plotPartialDependenceGGVIS, plotPartialDependence

Other generate_plot_data: generateCalibrationData, generateCritDifferencesData, generateFeatureImportanceData, generateFilterValuesData, generateFunctionalANOVAData, generateLearningCurveData, generateThreshVsPerfData, getFilterValues, plotFilterValues

lrn = makeLearner("regr.svm")
fit = train(lrn, bh.task)
pd = generatePartialDependenceData(fit, bh.task, "lstat")
plotPartialDependence(pd, data = getTaskData(bh.task))

lrn = makeLearner("classif.rpart", predict.type = "prob")
fit = train(lrn, iris.task)
pd = generatePartialDependenceData(fit, iris.task, "Petal.Width")
plotPartialDependence(pd, data = getTaskData(iris.task))

# simulated example with weights computed via the joint distribution
# in practice empirical weights could be constructed by estimating the joint
# density from the training data (the data arg to fun) and computing the probability
# of the prediction grid under this estimated density (the newdata arg) or
# by using something like data depth or outlier classification to weight the
# unusualness of points in arg newdata.
sigma = matrix(c(1, .5, .5, 1), 2, 2)
C = chol(sigma)
X = replicate(2, rnorm(100)) %*% C
alpha = runif(2, -1, 1)
y = X %*% alpha
df = data.frame(y, X)
tsk = makeRegrTask(data = df, target = "y")
fit = train("regr.svm", tsk)

w.fun = function(x, newdata) {
 # compute multivariate normal density given sigma
 sigma = matrix(c(1, .5, .5, 1), 2, 2)
 dec = chol(sigma)
 tmp = backsolve(dec, t(newdata), transpose = TRUE)
 rss = colSums(tmp^2)
 logretval = -sum(log(diag(dec))) - 0.5 * ncol(newdata) * log(2 * pi) - 0.5 * rss
 w = exp(logretval)
 # weight prediction grid given probability of grid points under the joint
 # density
 sum(w * x) / sum(w)
}

generatePartialDependenceData(fit, tsk, "X1", fun = w.fun)