nano_residuals: Plot actual vs predicted from fitted model

nano_residualsR Documentation

Plot actual vs predicted from fitted model

Description

Predict on data from fitted model and plots the actual vs predicted.

Usage

nano_residuals(
  nano,
  data = NA,
  model_no = NA,
  train_test = "data_id",
  group = NA,
  size = NA,
  save = TRUE
)

Arguments

nano

a nano object containing the fitted models.

data

a list of datasets. If the underlying dataset is the same for each model, can only input a list with a single element.

model_no

the positions of each model in the list of models in the nano object for which the PDP should be calculated. If not entered, the last model is taken by default.

train_test

a character. Variable in data which contains split for training, testing and holdout datasets (optional). Can only have the values: "train", "test", "holdout".

group

a character variable in data which the plot is to be grouped by.

size

a character variable in data which determines the size of each point in the plot. The size parameter is only used if the group parameter has been specified.

save

a logical specifying whether to save the output to the nano object (if save = TRUE) otherwise output as separate object..

Details

Functions checks whether the data contains the train_test column. If it does then the actual vs predicted is calculated for each split specified in the train-test column. Otherwise, the actual vs predict is calculated based on the total data. If the plot is desired to be performed on a subset of the data (e.g. to see performance of the model on a specific part of the data) then the data argument can be used to supply the data subseted in the desired manner. If the data argument is not used, then by default the data used to train the model is used by the function.

Value

if save = TRUE then returns nano object with the actual vs predicted for the specified models. If save = FALSE then returns a list with the actual vs predicted for the specified models.

Examples

## Not run: 
if(interactive()){
 library(h2o)
 library(nano)
 
 h2o.init()
 
 # import dataset
 data(property_prices)
 train <- as.h2o(property_prices)
 
 # set the response and predictors
 response <- "sale_price"
 var <- setdiff(colnames(property_prices), response)
 
 # build grids
 grid_1 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 1:2),
                    nfolds          = 3,
                    seed            = 628)

 grid_2 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 3:4),
                    nfolds          = 3,
                    seed            = 628)

 
 obj <- create_nano(grid = list(grid_1, grid_2),
                    data = list(property_prices), # since underlying dataset is the same 
                    ) # since model is not entered, will take best model from grids
 
 # score on both models
 obj <- nano_residuals(nano = obj, model_no = 1:2, save = TRUE)
 }

## End(Not run)

Nanoputian628/nano documentation built on Oct. 30, 2023, 3:28 p.m.