nano_pdp: Create PDP
In Nanoputian628/nano: Data Visualisation and Model Selection

nano_pdp

R Documentation

Create PDP

Description

Creates partial dependency plots (PDPs) from h2o models stored i nano objects.

Usage

nano_pdp(
  nano,
  model_no = NA,
  vars,
  row_index = -1,
  plot = TRUE,
  save = FALSE,
  subdir = NA,
  file_type = "html"
)

Arguments

`model_no`	the positions of each model in the list of models in the nano object for which the PDP should be calculated. If not entered, the last model is taken by default.
`vars`	a character vector of variables to create PDPs off.
`row_index`	a numeric vector of dataset rows numbers to be used to calculate PDPs. To use entire dataset, set to -1.
`plot`	a logical specifying whether the variable importance should be plotted.
`save`	a logical specifying whether the plot should be saved into working directory.
`subdir`	sub directory of the working directory in which the plot should be saved.
`file_type`	file type in which the plots should be saved. Can take values `html`, `jpeg`, `png`, `pdf`.

Details

Function first checks if the PDPs of the specified models have already been calculated (by checking in the list nano$pdp). If it has not been calculated, then the required PDPs will be calculated and the relevant slot in nano$pdp will be filled out.

If plot = TRUE, a plot of the PDPs will also be returned. The plot can be saved in a subfolder of the working directory by using the save and subdir arguments.

Value

nano object with PDPs of specified models calculated. Also returns a plot if plot = TRUE and saves each of the plots if save = TRUE..

Examples

## Not run: 
if(interactive()){
 library(h2o)
 library(nano)
 
 h2o.init()
 
 # import dataset
 data(property_prices)
 train <- as.h2o(property_prices)
 
 # set the response and predictors
 response <- "sale_price"
 var <- setdiff(colnames(property_prices), response)
 
 # build grids
 grid_1 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 1:2),
                    nfolds          = 3,
                    seed            = 628)

 grid_2 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 3:4),
                    nfolds          = 3,
                    seed            = 628)

 
 obj <- create_nano(grid = list(grid_1, grid_2),
                    data = list(property_prices), # since underlying dataset is the same 
                    ) # since model is not entered, will take best model from grids
 
 # calculate PDP and save plots in working directory
 obj <- nano_pdp(nano = obj, model_no = 1:2, vars <- c("lot_size", "income"), 
 plot = TRUE, save = TRUE)
 
 }

## End(Not run)

Nanoputian628/nano documentation built on Oct. 30, 2023, 3:28 p.m.