Description Usage Arguments Value See Also
View source: R/get_grid_data.R
get_grid_data creates a data.frame that has the datasets in the first column and the best error rate obtained in the grid search in the second column.
1 2 3 4 5 6 7 | get_grid_data(
path = ".",
pattern = NULL,
dataset = "Data",
method = NULL,
model_type = NULL
)
|
path |
File path to folder with the files that hold the grid results. |
pattern |
An optional character vector that can be used to select a subset of files in the folder. |
dataset |
Name of the dataset for which the grid search was done. |
method |
A character string that specifies the method used to create the grid. Choices are "svm", "gbm", "en", and "ada". This is added to the datasets to minimize ambiguity in downstream analysis. |
model_type |
Character string of either "binary" or "regression" that specifies the type of model. This is needed because some of the earlier grid searches had inconsistent loss scales. |
Returns a list with a series of data.frames that can be used to create plots of the grid surface or find the best error rates. Note that some of the datasets in the list may have more observations than indicated by the measures. This is because there are a substantial number of ties. The datasets in the list are:
data |
Complete dataset of grid results. |
dat20loss |
Dataset containing only the best 20% in terms of loss (classification error, AUC, MSE, or MAE). |
dat10loss |
Dataset containing only the best 10% in terms of loss. |
dat5loss |
Dataset containing only the best 5% in terms of loss. |
dat1loss |
Dataset containing only the best 1% in terms of loss. |
dat20time |
Dataset containing only the best 20% in terms of computation time. |
dat10time |
Dataset containing only the best 10% in terms of computation time. |
dat5time |
Dataset containing only the best 5% in terms of computation time. |
dat1time |
Dataset containing only the best 1% in terms of computation time. |
top20loss |
Twenty grid locations with the best loss. |
top20time |
Twenty grid locations with the best computation times. |
Each of the datasets has the following variables:
Data |
Name of the dataset used to create the grid. |
Method |
Method used for EZtune. Should be "svm", "gbm", "en", or "ada". It will be exactly as it is entered into the method argument. |
Tuning_Variables |
These fields contain the tuning variables for the method. For svm they are Cost and Gamma (Note that Gamma is really log2(Gamma)), and Epsilon for regression models; gbm is NumTrees, MinNode, Shrinkage, IntDepth; en is Alpha and logLambda; ada is Nu, Iter, and MaxDepth. |
Loss |
The loss measure used in the grid search. It is typically classification error or MSE, but it can be AUC or MAE as well. It is computed using 10-fold cross validation. |
LossUCL |
A measure of stability for the Loss measure. The test loss for each of the folds in 10-fold cross validation were used to compute a 95% upper confidence interval for the loss. If it differs substantially from the Loss it indicates that results for the model with those tuning parameters are unstable. |
Time |
Computation time in seconds. |
grid_search
, eztune_table
,
get_best_grid
, grid_plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.