nano_multi_pdp: Calculates PDP for multiple models
In Nanoputian628/nano: Data Visualisation and Model Selection

nano_multi_pdp

R Documentation

Calculates PDP for multiple models

Description

Calculates partial dependency plots (PDPs) from multiple h2o models.

Usage

nano_multi_pdp(models, data, vars, row_index = -1)

Arguments

`models`	a list of h2o models.
`data`	a list of datasets.
`vars`	a character vector of variables to create PDPs off.
`row_index`	a numeric vector of dataset rows numbers to be used to calculate PDPs. To use entire dataset, set to -1.

Details

Creates a list of data.tables. Each data.table corresponds to the calculated PDPs values from a single model. In each data.table, contains the PDPs values for each variable combined together into a single data.table.

For creating pdps, it is recommended to instead use the nano_pdp function which is a wrapper for a series of functions which creates pdps. It is able to create pdps directly from a nano object, for both single and multi models, and has the option to return plots of the pdps.

Value

a list of data.tables containing the calculated PDPs for each model. Each data.table has the outputs for each variable in vars combined into the one data.table.

Examples

## Not run: 
if(interactive()){
 library(h2o)
 library(nano)
 
 h2o.init()
 
 # import dataset
 data(property_prices)
 train <- as.h2o(property_prices)
 
 # set the response and predictors
 response <- "sale_price"
 var <- setdiff(colnames(property_prices), response)
 
 # build grids
 grid_1 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 1:2),
                    nfolds          = 3,
                    seed            = 628)

 grid_2 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 3:4),
                    nfolds          = 3,
                    seed            = 628)
 
 model_1 <- h2o.getModel(grid_1@model_ids[[1]])
 model_2 <- h2o.getModel(grid_2@model_ids[[1]])
 
 # calculate pdp
 nano_multi_pdp(models = list(model_1, model_2), 
                data   = list(property_prices), 
                vars   = c("lot_size", "income"))
 
 }

## End(Not run)

Nanoputian628/nano documentation built on Oct. 30, 2023, 3:28 p.m.