tune: Tune parameters of a model-building method
In awqx/qsarr: Useful Functions for Building a QSAR in R

Description Usage Arguments Details Value Examples

tune evaluates model performance on a combination of parameters. The methods available are the same as [eval_model()].

Passing method = "mars" or method = "earth" tunes a MARS model using the function [earth::earth()].

Passing method = "glm" or method = "glmnet" tunes a GLM using the function [glmnet::glmnet()].

Passing method = "rf" tunes the function [randomForest::randomForest()].

For all SVM methods, the function is tuned on [e1071::svm()] and assumes that the SVM type being used for model-building is "eps-regression". This assumes that the response variable being passed to the function is numeric. The list of parameters to tune can be found in documentation for the function ?e1071::svm. The methods "svm_linear", "svm_polynomial", "svm_radial", and "svm_sigmoid" are separated because each SVM kernel can take different combinations of parameters to tune.

tune(method, ...)

# To call MARS (methods are identical)
  tune(method = "earth", df, resp, nfold = 10, nrep = 1, ...)
  tune(method = "mars", df, resp, nfold = 10, nrep = 1, ...)

# To call GLM (methods are identical)
  tune(method = "glm", df, resp, nfold = 10, nrep = 1, ...)
  tune(method = "glmnet", df, resp, nfold = 10, nrep = 1, ...)

tune(method = "rf", df, resp, nfold = 10, nrep = 1, ...)

tune(method = "svm_linear", df, resp, nfold = 10, nrep = 1, ...)

tune(method = "svm_polynomial", df, resp, nfold = 10, nrep = 1, ...)

tune(method = "svm_radial", df, resp, nfold = 10, nrep = 1, ...)

tune(method = "svm_sigmoid", df, resp, nfold = 10, nrep = 1, ...)

`method`	The model-building method. Should be `"rf"` at this point.
`...`	Additional arguments to be passed to model-building. This will likely be vectors of the values of the parameters to test.
`df`	The data frame to train on
`resp`	The name of the column containing the response variable
`nfold`	The number of folds to use in evaluation. Default is `10`.
`nrep`	The number of repetitions to use in evaluation. Default is `1`.
`ignore_col`	Columns to ignore during model-building. Default is `NA`.

Calling print on a "tune" object provides details on the model type and the model performance.

Calling predict on a "tune" object runs prediction using the class of the model stored in the object.

There are many parameters to tune "earth". Likely the most useful ones will be fast.k, fast.beta, newvar.penalty, penalty, minspan, and degree. If time allows, earth can do more thorough variable selection with different pruning methods and cross-validation.

There are many parameters to tune the GLM models. Likely the most useful ones will be alpha, nlambda, dfmax, pmax, and family.

An alpha value of alpha = 1 uses lasso penalty. An alpha = 0 uses ridge penalty.

Possible parameters to tune "rf" are mtry, replace, sampsize, nodesize, and maxnodes.

Possible parameters to tune "svm_linear" include cost, tolerance, and epsilon.

Possible parameters to tune "svm_polynomial" include degree, gamma, coef0, cost, tolerance, and epsilon.

Possible parameters to tune "svm_radial" include gamma, cost, tolerance, and epsilon.

Possible parameters to tune "svm_sigmoid" include gamma, coef0, cost, tolerance, and epsilon.

An object of the S3 class "tune". Includes a list of the model with the best performing parameters.

$model: the final model with the tuned parameters
$param_tested: a list of the parameters used in the tuning process
$nfold_tested: the number of folds in each iteration of tuning
$nrep_tested: the number of repetitions in each iteration of tuning
$pred_name: the predictors from the data set

# Using "mars" or "earth" as the method
tune(
  method = "earth", df = your_data, resp = "y",
  nfold = 10, nrep = 10,
  fast.k = c(0, 5, 10, 20),
  fast.beta = c(0, 1),
  newvar.penalty = c(0, 0.01, 0.1, 0.2, 0.25),
  penalty = c(2, 3, 4),
  minspan = c(0, 1, 4, 10)
  degree = c(1, 2, 3)
)
# Using "mars" or "earth" as the method
tune(
  method = "earth", df = your_data, resp = "y",
  nfold = 10, nrep = 10,
  alpha = seq(0, 1, by = 0.2),
  fast.k = c(0, 5, 10, 20),
  nlambda = c(20, 50, 100, 200),
  dfmax = c(10, 50, length(data) - 1),
  pmax = c(10, 50, 100, length(data) - 1)
)
# Using tune and "rf" (randomForest) as the method
tune(
  method = "rf", df = your_data, resp = "y",
  nfold = 10, nrep = 10,
  mtry = c(2, 4, 8, 14),
  replace = c(T, F),
  sampsize = c(10, 20, 30)
)
# Using "svm_linear" as the method
tune(
  method = "svm_linear", df = your_data, resp = "y",
  nfold = 10, nrep = 10,
  cost = c(0, 0.1, 0.25, 0.5, 1),
  epsilon = c(0, , 0.05, 0.1, 0.5, 1),
)