grid_search: Performs a grid search for a dataset

Description Usage Arguments Details Value See Also

View source: R/grid_search.R

Description

grid_search does a grid search for a dataset with a specified model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
grid_search(
  x,
  y,
  grid,
  cross = 10,
  data_name,
  method,
  start = NULL,
  end = NULL,
  path = "."
)

Arguments

x

Matrix or data frame containing the dependent variables..

y

Vector of responses. Can either be a factor or a numeric vector.

grid

A data frame that contains the grid that will be filled by the function. The grid can be one generated the function create_grid.R or one created by the user. The Details section provides information about how make a grid data frame.

cross

Number of folds for nfold cross-validation.

data_name

Name of the dataset. Used to name the output file and as an identifier within the output dataset.

method

Model to be fit. Choices are "ada" for adaboost, "en" for elastic net, "gbm" for gradient boosting machines, and "svm" for support vector machines.

start

Start location of the grid to be processed in the run. The size of the grids created using create_grid.R are:

end

Stop location of the grid to be processed in the run.

path

This is the path to the directory that contains the grid result output files.#'

Details

The function create_grid.R can be used to create the grid, but you can create your own grid that will be filled in with the function. Each data.frame must have the variables for the tuning variables filled in with the values of the tuning variables. The variables for Accuracy, AUC, MSE, MAE, and time can be filled with any value, because those values will be replaced, but it is recommended that they are filled with NA so that models that cannot be computed for a given set of grid values can be easily identified. The grids must be a data.frame with the following specifications:

SVM: The column names for the data.frame for the classification model must be Cost, Gamma, Accuracy, AUC, and Time. The column names for the regression model are Cost, Gamma, Epsilon, MSE, MAE, and Time. The columns for Cost, Gamma, and Epsilon must be numeric vectors. The values for the Gamma and Cost variables are the 2^x; that is, they are the exponent, not the actual value for Gamma.

GBM: The column names for the data.frame for the classification model must be NumTrees, MinNode, Shrinkage, IntDepth, Accuracy, AUC, Time. The column names for the regression models are NumTrees, MinNode, Shrinkage, IntDepth, MSE, MAE, Time.

Elastic Net: The column names for the data.frame for the classification model must be Alpha, logLambda, Accuracy, AUC, Time. The column names for the regression models are Alpha, logLambda, MSE, MAE, Time. Note that the value in the logLambda variable is the natural log of lambda.

Adaboost: The adaboost model can only be computed for classificaton models. The column names for the grid are Nu, Iter, Maxdepth, Accuracy, AUC, and Time.

Value

Returns a dataset that has all of the outputs from the grid search. Typically, a folder will contain the tests for one dataset. The dataset contains the following variables:

See Also

binned_stats, average_metric


jillbo1000/EZtuneTest documentation built on Oct. 5, 2021, 4:16 p.m.