Compute partial dependence functions (i.e., marginal effects) for various model fitting objects.
1 2 3 4 5 6 7 8 9 10  partial(object, ...)
## Default S3 method:
partial(object, pred.var, pred.grid, pred.fun = NULL,
grid.resolution = NULL, quantiles = FALSE, probs = 1:9/10,
trim.outliers = FALSE, type = c("auto", "regression", "classification"),
which.class = 1L, prob = FALSE, recursive = TRUE, plot = FALSE,
smooth = FALSE, rug = FALSE, chull = FALSE, train, cats = NULL,
check.class = TRUE, progress = "none", parallel = FALSE,
paropts = NULL, ...)

object 
A fitted model object of appropriate class (e.g.,

... 
Additional optional arguments to be passed onto

pred.var 
Character string giving the names of the predictor variables of interest. For reasons of computation/interpretation, this should include no more than three variables. 
pred.grid 
Data frame containing the joint values of interest for the
variables listed in 
pred.fun 
Optional prediction function that requires two arguments:

grid.resolution 
Integer giving the number of equally spaced points to
use (only used for the continuous variables listed in 
quantiles 
Logical indicating whether or not to use the sample
quantiles of the numeric predictors listed in 
probs 
Numeric vector of probabilities with values in [0,1]. (Values up
to 2e14 outside that range are accepted and moved to the nearby endpoint.)
Default is 
trim.outliers 
Logical indicating whether or not to trim off outliers
from the numeric predictors (using the simple boxplot method) before
creating the grid of joint values for which the partial dependence is
computed. Default is 
type 
Character string specifying the type of supervised learning.
Current options are 
which.class 
Integer specifying which column of the matrix of predicted
probabilities to use as the "focus" class. Default is to use the first
class. Only used for classification problems (i.e., when

prob 
Logical indicating whether or not partial dependence for
classification problems should be returned on the probability scale, rather
than the centered logit. If 
recursive 
Logical indicating whether or not to use the weighted tree
traversal method described in Friedman (2001). This only applies to objects
that inherit from class 
plot 
Logical indicating whether to return a data frame containing the
partial dependence values ( 
smooth 
Logical indicating whether or not to overlay a LOESS smooth.
Default is 
rug 
Logical indicating whether or not to include rug marks on the
predictor axes. Only used when 
chull 
Logical indicating wether or not to restrict the first
two variables in 
train 
An optional data frame containing the original training
data. This may be required depending on the class of 
cats 
Character string indicating which columns of 
check.class 
Logical indicating whether or not to make sure each column
in 
progress 
Character string giving the name of the progress bar to use.
See 
parallel 
Logical indicating whether or not to run 
paropts 
List containing additional options passed onto

If plot = FALSE
(the default) partial
returns a data
frame with the additional class "partial"
that is specially recognized
by the plotPartial
function. If plot = TRUE
then partial
returns a "trellis" object (see lattice
for details)
with an additional attribute, "partial.data"
, containing the data
displayed in the plot.
In some cases it is difficult for partial
to extract the original
training data from object
. In these cases an error message is
displayed requesting the user to supply the training data via the
train
argument in the call to partial
. In most cases where
partial
can extract the required training data from object
,
it is taken from the same environment in which partial
is called.
Therefore, it is important to not change the training data used to construct
object
before calling partial
. This problem is completely
avoided when the training data are passed to the train
argument in the
call to partial
.
It is recommended to call partial
with plot = FALSE
and store
the results; this allows for more flexible plotting, and the user will not
have to waste time calling partial
again if the default plot is not
sufficient.
It is possible to retrieve the last printed "trellis"
object, such as
those produced by plotPartial
, using trellis.last.object()
.
If the prediction function given to pred.fun
returns a prediction for
each observation in newdata
, then the result will be a PDP for each
observation. These are called individual conditional expectation (ICE)
curves; see Goldstein et al. (2015) and ice
for
details.
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29: 11891232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, 24(1): 4465, 2015.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90  ## Not run:
#
# Regression example (requires randomForest package to run)
#
# Fit a random forest to the boston housing data
library(randomForest)
data (boston) # load the boston housing data
set.seed(101) # for reproducibility
boston.rf < randomForest(cmedv ~ ., data = boston)
# Using randomForest's partialPlot function
partialPlot(boston.rf, pred.data = boston, x.var = "lstat")
# Using pdp's partial function
head(partial(boston.rf, pred.var = "lstat")) # returns a data frame
partial(boston.rf, pred.var = "lstat", plot = TRUE, rug = TRUE)
# The partial function allows for multiple predictors
partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40,
plot = TRUE, chull = TRUE, progress = "text")
# The plotPartial function offers more flexible plotting
pd < partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40)
plotPartial(pd, levelplot = FALSE, zlab = "cmedv", drape = TRUE,
colorkey = FALSE, screen = list(z = 20, x = 60))
# The autplot function can be used to produce graphics based on ggplot2
library(ggplot2)
autoplot(pd, contour = TRUE, contour = TRUE,
legend.title = "Partial\ndependence")
#
# Individual conditional expectation (ICE) curves
#
# Use partial to obtain ICE curves
pred.ice < function(object, newdata) predict(object, newdata)
rm.ice < partial(boston.rf, pred.var = "rm", pred.fun = pred.ice)
plotPartial(rm.ice, rug = TRUE, train = boston, alpha = 0.2)
autoplot(rm.ice, center = FALSE, alpha = 0.2, rug = TRUE, train = boston)
#
# Centered ICE curves (cICE curves) (requires dplyr and ggplot2 to run)
#
# Postprocess rm.ice to obtain cICE curves
library(dplyr) # for group_by and mutate functions
rm.cice < rm.ice %>%
group_by(yhat.id) %>% # perform next operation within each yhat.id
mutate(yhat.centered = yhat  first(yhat)) # so each curve starts at yhat = 0
# ICE curves with their average
library(ggplot2)
p1 < ggplot(rm.ice, aes(rm, yhat)) +
geom_line(aes(group = yhat.id), alpha = 0.2) +
stat_summary(fun.y = mean, geom = "line", col = "red", size = 1)
# cICE curves with their average
p2 < ggplot(rm.cice, aes(rm, yhat.centered)) +
geom_line(aes(group = yhat.id), alpha = 0.2) +
stat_summary(fun.y = mean, geom = "line", col = "red", size = 1)
grid.arrange(p1, p2, ncol = 2)
# Or just use autoplot (the default is to center the curves first)
autoplot(rm.ice, alpha = 0.2, rug = TRUE, train = boston)
#
# Classification example (requires randomForest package to run)
#
# Fit a random forest to the Pima Indians diabetes data
data (pima) # load the boston housing data
set.seed(102) # for reproducibility
pima.rf < randomForest(diabetes ~ ., data = pima, na.action = na.omit)
# Partial dependence of diabetes test result (neg/pos) on glucose
partial(pima.rf, pred.var = c("glucose", "age"), plot = TRUE, chull = TRUE,
progress = "text")
# Partial dependence of positive diabetes test result on glucose, plotted on
# the probability scale, rather than the centered logit
pfun < function(object, newdata) {
mean(predict(object, newdata, type = "prob")[, "pos"], ne.rm = TRUE)
}
partial(pima.rf, pred.var = "glucose", pred.fun = pfun,
plot = TRUE, chull = TRUE, progress = "text")
## End(Not run)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.