JuiceBoxCV_GridSearch: JuiceBoxCV_GridSearch

Description Usage Arguments Value Examples

View source: R/hello.R

Description

Use grid search in conjunction with cross validation to find optimal parameters for your pipline. For each parameter, a cross validation procedure utilizing numFolds folds and numRepeats repetitions will be executed. The parameter yielding largest CV score will be returned (i.e. maximization). In order to search for parameters that minimize the objective, simply negate the value return in the pipeline function.

Usage

1
2
JuiceBoxCV_GridSearch(X_train, Y_train, numFolds, numRepeats, goParallel,
  numCores, seedNum, verbose_p, fitGrid, fn)

Arguments

X_train

Training Data (excludes the response/target we wish to predict ) that will be fed into the pipeline function.

Y_train

Training Response/Target - The response/target that will be fed into the pipeline function.

numFolds

Integer indicating the number of folds to use in the cross validation procedure.

numRepeats

Integer indicating the number of times to repeat cross validation with numFolds.

goParallel

String indicating how to parallelize the procedure. If "parGS", then the parallelization will happen across on the grid search across the parameters. If "parCV", then the parallelization will happen across the numFolds and numRepeats cross validation procedure.

numCores

Integer indicating the number of cores to use.

seedNum

Integer indicating the seed number. Using the same seed will generate the same folds.

verbose_p

Boolean indicating if grid search details should be printed out the screen.

fitGrid

Dataframe dictating the combinations to try. See examples for details.

fn

The pipeline function. The pipeline function must take parameters training data, training response, validation data, validation response. See examples for details.

Value

Optimal parameters found from grid search

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
library(JuiceBox)
library(Metrics)
library(xgboost)

# Toy data set for classification
irisAllMat <- iris
irisTrainMat <- irisAllMat[,c(1:4)]
irisTrainResponse <- irisAllMat[,c(ncol(irisAllMat))]
irisTrainResponse <- factor(ifelse(irisTrainResponse == "setosa", "Yes", "No"))

 # Toy data set for regression
mtcarsMat <- mtcars
mtcarsResponse <-mtcarsMat[,1]
mtcarsMat <- mtcarsMat[,c(2:ncol(mtcarsMat))]

# Pipelines
xgbPipeline_regression <- function(X_train, Y_train, X_test, Y_test, params)
{
  Y_train <- as.numeric(Y_train) - 1
  xgbFit <- xgboost(data = as.matrix(X_train), label = Y_train, max.depth = params[1],
                    eta = params[2], nround = params[3], objective = "reg:linear", verbose = 1)
  predictions <- predict(xgbFit, newdata = as.matrix(X_test))
  actual <- as.numeric(Y_test) - 1
  rmseValue <- rmse(predictions, actual)
  return(-rmseValue)
}

xgbPipeline_classification <- function(X_train, Y_train, X_test, Y_test, params)
{
  Y_train <- as.numeric(Y_train) - 1
  xgbFit <- xgboost(data = as.matrix(X_train), label = Y_train, max.depth = params[1],
                    eta = params[2], nround = params[3], objective = "binary:logistic", verbose = 1)
  predictions <- predict(xgbFit, newdata = as.matrix(X_test))
  actual <- as.numeric(Y_test) - 1
  logLossValue <- logLoss(actual, predictions)
  return(-logLossValue)
}

# Testing JuiceBoxCV_GridSearch
JuiceBoxCV_GridSearch(X_train = irisTrainMat, Y_train = irisTrainResponse, numFolds = 2,
                      numRepeats = 2, goParallel = "none", numCores = 8,
                      seedNum = 101, verbose_p = FALSE,
                      fitGrid = expand.grid(c(1:4), c(1:3), c(1:5)),
                      fn = xgbPipeline_classification)

JuiceBoxCV_GridSearch(X_train = mtcarsMat, Y_train = mtcarsResponse, numFolds = 2,
                      numRepeats = 2, goParallel = "none", numCores = 8,
                      seedNum = 101, verbose_p = FALSE,
                      fitGrid = expand.grid(c(1:4), c(1:3), c(1:5)),
                      fn = xgbPipeline_regression)

sjoshistrats/JuiceBox documentation built on May 30, 2019, 12:05 a.m.