Compute partial dependence functions (i.e., marginal effects) for various model fitting objects.
1 2 3 4 5 6 7 8  partial(object, ...)
## Default S3 method:
partial(object, pred.var, pred.grid, pred.fun = NULL,
grid.resolution = NULL, type = c("auto", "regression", "classification"),
which.class = 1L, plot = FALSE, smooth = FALSE, rug = FALSE,
chull = FALSE, train, check.class = TRUE, progress = "none",
parallel = FALSE, paropts = NULL, ...)

object 
A fitted model object of appropriate class (e.g.,

... 
Additional optional arguments to be passed onto

pred.var 
Character string giving the names of the predictor variables of interest. For reasons of computation/interpretation, this should include no more than three variables. 
pred.grid 
Data frame containing the joint values of interest for the
variables listed in 
pred.fun 
Optional prediction function that requires two arguments:

grid.resolution 
Integer giving the number of equally spaced points to
use (only used for the continuous variables listed in 
type 
Character string specifying the type of supervised learning.
Current options are 
which.class 
Integer specifying which column of the matrix of predicted
probabilities to use as the "focus" class. Default is to use the first
class. Only used for classification problems (i.e., when

plot 
Logical indicating whether to return a data frame containing the
partial dependence values ( 
smooth 
Logical indicating whether or not to overlay a LOESS smooth.
Default is 
rug 
Logical indicating whether or not to include rug marks on the
predictor axes. Only used when 
chull 
Logical indicating wether or not to restrict the first
two variables in 
train 
An optional data frame containing the original training
data. This may be required depending on the class of 
check.class 
Logical indicating whether or not to make sure each column
in 
progress 
Character string giving the name of the progress bar to use.
See 
parallel 
Logical indicating whether or not to run 
paropts 
List containing additional options passed onto

If plot = FALSE
(the default) partial
returns a data
frame with the additional class "partial"
that is specially recognized
by the plotPartial
function. If plot = TRUE
then partial
returns a "trellis" object (see lattice
for details)
with an additional attribute, "partial.data"
, containing the data
displayed in the plot.
In some cases it is difficult for partial
to extract the original
training data from object
. In these cases an error message is
displayed requesting the user to supply the training data via the
train
argument in the call to partial
. In most cases where
partial
can extract the required training data from object
,
it is taken from the same environment in which partial
is called.
Therefore, it is important to not change the training data used to construct
object
before calling partial
. This problem is completely
avoided when the training data are passed to the train
argument in the
call to partial
.
It is possible to retrieve the last printed "trellis"
object, such as
those produced by plotPartial
, using trellis.last.object()
.
It is possible for partial
to run much faster if object
inherits from class "gbm"
. In particular, if object
inherits
from class "gbm"
and pred.grid
is not specified, then
partial
makes an internal call to gbm::plot.gbm
in order to
exploit gbm
's implementation of the weighted tree traversal method
described in Friedman (2001).
If the prediction function given to pred.fun
returns a prediction for
each observation in newdata
, then the result will be a PDP for each
observation. These are called individual conditional expectation (ICE)
curves; see Goldstein et al. (2015) and ice
for
details.
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29: 11891232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, 24(1): 4465, 2015.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72  ## Not run:
#
# Regression example (requires randomForest package to run)
#
# Fit a random forest to the boston housing data
library(randomForest)
data (boston) # load the boston housing data
set.seed(101) # for reproducibility
boston.rf < randomForest(cmedv ~ ., data = boston)
# Using randomForest's partialPlot function
partialPlot(boston.rf, pred.data = boston, x.var = "lstat")
# Using pdp's partial function
head(partial(boston.rf, pred.var = "lstat")) # returns a data frame
partial(boston.rf, pred.var = "lstat", plot = TRUE, rug = TRUE)
# The partial function allows for multiple predictors
partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40,
plot = TRUE, chull = TRUE, progress = "text")
# The plotPartial function offers more flexible plotting
pd < partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40)
plotPartial(pd) # the default
plotPartial(pd, levelplot = FALSE, zlab = "cmedv", drape = TRUE,
colorkey = FALSE, screen = list(z = 20, x = 60))
#
# Classification example (requires randomForest package to run)
#
# Fit a random forest to the Pima Indians diabetes data
data (pima) # load the boston housing data
set.seed(102) # for reproducibility
pima.rf < randomForest(diabetes ~ ., data = pima, na.action = na.omit)
# Partial dependence of diabetes test result (neg/pos) on glucose
partial(pima.rf, pred.var = c("glucose", "age"), plot = TRUE, chull = TRUE,
progress = "text")
# Partial dependence of positive diabetes test result on glucose, plotted on
# the probability scale, rather than the centered logit
pfun < function(object, newdata) {
mean(predict(object, newdata, type = "prob")[, "pos"], ne.rm = TRUE)
}
partial(pima.rf, pred.var = "glucose", pred.fun = pfun,
plot = TRUE, chull = TRUE, progress = "text")
#
# Interface with caret (requires caret package to run)
#
# Load required packages
library(caret) # for model training/tuning
# Set up for 5fold crossvalidation
ctrl < trainControl(method = "cv", number = 5, verboseIter = TRUE)
# Tune a support vector machine (SVM) using a radial basis function kerel to
# the Pima Indians diabetes data
set.seed(103) # for reproducibility
pima.svm < train(diabetes ~ ., data = pima, method = "svmRadial",
prob.model = TRUE, na.action = na.omit, trControl = ctrl,
tuneLength = 10)
# Partial dependence of glucose on diabetes test result (neg/pos)
partial(pima.svm, pred.var = "glucose", plot = TRUE, rug = TRUE)
## End(Not run)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.