Description Format Details References See Also Examples
Cached plot.variable
objects for examples,
diagnostics and vignettes.
Data sets storing plot.variable
objects corresponding to
training data according to the following naming convention:
partial_Boston_surf
- from a randomForestS[R]C for the Boston
housing
data set (MASS
package).
partial_pbc_surf
- from a randomForest[S]RC for the pbc
data set
(randomForestSRC
package)
partial_pbc_time
- from a randomForest[S]RC for the pbc
data set
(randomForestSRC
package)
list of plot.variable
objects
Constructing partial plot data with the randomForestsSRC::plot.variable function are
computationally expensive. We cache plot.variable
objects
to improve the ggRandomForests
examples, diagnostics and vignettes run times.
(see cache_rfsrc_datasets
to rebuild a complete set of these data sets.)
For each data set listed, we build a rfsrc
(see rfsrc_data
), then calculate the partial plot data with
plot.variable
function, setting partial=TRUE
. Each data set is
built with the cache_rfsrc_datasets
with the randomForestSRC
version
listed in the ggRandomForests
DESCRIPTION file.
partial_Boston
- The Boston
housing values in suburbs of Boston from the
MASS
package. Build a regression random forest for predicting medv (median home
values) on 13 covariates and 506 observations.
partial_pbc
- The pbc
data from the Mayo Clinic trial in primary biliary
cirrhosis (PBC) of the liver conducted between 1974 and 1984. A total of 424 PBC patients,
referred to Mayo Clinic during that ten-year interval, met eligibility criteria for the
randomized placebo controlled trial of the drug D-penicillamine. 312 cases participated in
the randomized trial and contain largely complete data. Data from the randomForestSRC
package. Build a survival random forest for time-to-event death data with 17 covariates and
312 observations (remaining 106 observations are held out).
#——————— randomForestSRC ———————
Ishwaran H. and Kogalur U.B. (2014). Random Forests for Survival, Regression and Classification (RF-SRC), R package version 1.5.5.
Ishwaran H. and Kogalur U.B. (2007). Random survival forests for R. R News 7(2), 25-31.
Ishwaran H., Kogalur U.B., Blackstone E.H. and Lauer M.S. (2008). Random survival forests. Ann. Appl. Statist. 2(3), 841-860.
#——————— Boston data set ———————
Belsley, D.A., E. Kuh, and R.E. Welsch. 1980. Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.
Harrison, D., and D.L. Rubinfeld. 1978. "Hedonic Prices and the Demand for Clean Air." J. Environ. Economics and Management 5: 81-102.
#——————— pbc data set ———————
Flemming T.R and Harrington D.P., (1991) Counting Processes and Survival Analysis. New York: Wiley.
T Therneau and P Grambsch (2000), Modeling Survival Data: Extending the Cox Model, Springer-Verlag, New York. ISBN: 0-387-98784-3.
Boston
pbc
plot.variable
rfsrc_data
cache_rfsrc_datasets
gg_partial
plot.gg_partial
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ## Not run:
#---------------------------------------------------------------------
# MASS::Boston data - regression random forest
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_Boston, package="ggRandomForests")
# The plot.variable call
partial_Boston <- plot.variable(rfsrc_Boston,
partial=TRUE, show.plots = FALSE )
# plot the forest partial plots
gg_dta <- gg_partial(partial_Boston)
plot(gg_dta, panel=TRUE)
#---------------------------------------------------------------------
# randomForestSRC::pbc data - survival random forest
#---------------------------------------------------------------------
# load the rfsrc object from the cached data
data(rfsrc_pbc, package="ggRandomForests")
# Restrict the time of interest to less than 5 years.
time_pts <- rfsrc_pbc$time.interest[which(rfsrc_pbc$time.interest<=5)]
# Find the 50 points in time, evenly space along the distribution of
# event times for a series of partial dependence curves
time_cts <-quantile_pts(time_pts, groups = 50)
# Generate the gg_partial_coplot data object
system.time(partial_pbc_time <- lapply(time_cts, function(ct){
plot.variable(rfsrc_pbc, xvar = "bili", time = ct,
npts = 50, show.plots = FALSE,
partial = TRUE, surv.type="surv")
}))
# user system elapsed
# 2561.313 81.446 2641.707
# Find the quantile points to create 50 cut points
alb_partial_pts <-quantile_pts(rfsrc_pbc$xvar$albumin, groups = 50)
system.time(partial_pbc_surf <- lapply(alb_partial_pts, function(ct){
rfsrc_pbc$xvar$albumin <- ct
plot.variable(rfsrc_pbc, xvar = "bili", time = 1,
npts = 50, show.plots = FALSE,
partial = TRUE, surv.type="surv")
}))
# user system elapsed
# 2547.482 91.978 2671.870
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.