nano_scoring: Score data from fitted model and compare with response by...

nano_scoringR Documentation

Score data from fitted model and compare with response by percentiles

Description

Predict on data from fitted model and compares the mean prediction with the mean response by the inputted percentiles.

Usage

nano_scoring(
  nano,
  data = NA,
  model_no = NA,
  percentiles,
  train_test = "data_id",
  save = TRUE
)

Arguments

nano

a nano object containing the fitted models.

data

a list of datasets. If the underlying dataset is the same for each model, can only input a list with a single element.

model_no

the positions of each model in the list of models in the nano object for which the PDP should be calculated. If not entered, the last model is taken by default.

train_test

a character. Variable in data which contains split for training, testing and holdout datasets (optional). Can only have the values: "train", "test", "holdout".

save

a logical specifying whether to save the output to the nano object (if save = TRUE) otherwise output as separate object..

Details

Functions checks whether the data contains the train_test column. If it does then scoring is done for each split specified in the train-test column. Otherwise, the scoring is done on the total data. If desire to perform scoring on a subset of the data (e.g. to see performance of the model on a specific part of the data) then the data argument can be used to supply the data subseted in the desired manner. If the data argument is not used, then by default the data used to train the model is used by the function.

Value

if save = TRUE then returns nano object with the specified models scored. If save = FALSE then returns a list with the specified models scored.

Examples

## Not run: 
if(interactive()){
 library(h2o)
 library(nano)
 
 h2o.init()
 
 # import dataset
 data(property_prices)
 train <- as.h2o(property_prices)
 
 # set the response and predictors
 response <- "sale_price"
 var <- setdiff(colnames(property_prices), response)
 
 # build grids
 grid_1 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 1:2),
                    nfolds          = 3,
                    seed            = 628)

 grid_2 <- h2o.grid(x               = var,
                    y               = response,
                    training_frame  = train,
                    algorithm       = "randomForest",
                    hyper_params    = list(ntrees = 3:4),
                    nfolds          = 3,
                    seed            = 628)

 
 obj <- create_nano(grid = list(grid_1, grid_2),
                    data = list(property_prices), # since underlying dataset is the same 
                    ) # since model is not entered, will take best model from grids
 
 # score on both models
 obj <- nano_scoring(nano = obj, model_no = 1:2, percentiles = seq(0, 1, 0.02), save = TRUE)
 }

## End(Not run)

Nanoputian628/nano documentation built on Oct. 30, 2023, 3:28 p.m.