gen_test: Construct ML models for use as a ML test

Description Usage Arguments Value Author(s) Examples

View source: R/gen_test.R

Description

Returns a list of model objects with accuracies and ROC coordinates based on test set

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
gen_test(
  train_set,
  val_set,
  use_case = "cv",
  pvalues = c(1e-04, 0.001, 0.01, 0.05, 0.1),
  model_params = list(list(method = "ranger", cv_folds = 5, tune_grid =
    expand.grid(mtry = c(3, 9, 27), splitrule = "gini", min.node.size = c(2, 4, 8))),
    list(method = "xgbTree", cv_folds = 5, tune_grid = expand.grid(nrounds = 300, eta =
    c(0.01, 0.03, 0.1, 0.3, 0.5), gamma = 0, colsample_bytree = c(0.8, 1),
    min_child_weight = 1, subsample = c(0.8, 1), max_depth = c(4, 6))))
)

Arguments

train_set

A training set outputted from gen_features function

val_set

A validation set outputted from gen_features function to be used to calibrate threshold

use_case

A string indicating if cross validation should be applied ('cv') or full sample should be used to train ('full'). If 'cv' is specified, then cv_folds should be specified in the model_params argument.

pvalues

A vector of alpha or p-values. This will be used to identify optimal decision threshold . (Default = c(0.0001, 0.001, 0.01, 0.05, 0.1))

model_params

A list of parameters for specifying models. Multiple methods allowed. Requires a 'method' tag to specify algorithm, 'cv_folds' to indicate number of folds for cross validation, then data frame of hyperparameters for inclusion in tune_grid.

Value

A list object

Author(s)

Gary Cornwall and Jeffrey Chen

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
## Not run: 
#Set splits
N <- 3000
train_index <- 1:2000
val_index <- 2001:N

#Set DGP parameters if Single scenario
dgp_params <- list(list(dgp = "dgp_enders3", sd = 1, gamma = 1),
                  list(dgp = "dgp_enders2", sd = 1, alpha0 = 1,  gamma = 1),
                  list(dgp = "dgp_enders1",  sd = 1, alpha0 = 1, alpha2 =  .005, gamma = 1))

#Simulate train and test time series
ts_data <- gen_bank(iter = N,
                   sample_prob = .50,
                   t = c(5,50),
                   freq = 12,
                   nur_ur = c(0.90000,.99999),
                   run_par = TRUE,
                   dgp_params = dgp_params)

#Construct feature set for each set
train_feat <- gen_features(ts_data[train_index])
val_feat <- gen_features(ts_data[val_index])

#Set algorithm parameters -- five-fold cross validation
model_params <- list(list(method = "ranger",
                         cv_folds = 5,
                         tune_grid = expand.grid(mtry = c(3, 9, 27),
                                                 splitrule = "gini",
                                                 min.node.size = c(2, 4, 16))))

#Train model
custom_set <- gen_test(train_set = train_feat,
                      val_set = val_feat,
                       model_params = model_params)

## End(Not run)

DataScienceForPublicPolicy/hypML documentation built on Dec. 17, 2021, 4:06 p.m.