CCI.pretuner | R Documentation |
The CCI.tuner
function performs a grid search over parameters for a conditional independence test using machine learning model supported by CCI.test. The tuner use the caret package for tuning.
CCI.pretuner(
formula,
data,
method = "rf",
metric = "RMSE",
validation_method = "cv",
folds = 4,
training_share = 0.7,
tune_length = 4,
random_grid = TRUE,
samples = 35,
poly = TRUE,
degree = 3,
interaction = TRUE,
verboseIter = FALSE,
include_explanatory = FALSE,
verbose = FALSE,
parallel = FALSE,
mtry = 1:10,
nrounds = c(100, 200, 300, 400, 500, 600, 700, 800, 900, 1000),
eta = seq(0.01, 0.3, by = 0.05),
max_depth = 2:6,
gamma = c(0, 1, 2, 3),
colsample_bytree = c(0.8, 0.9, 1),
min_child_weight = c(1, 3),
subsample = 1,
sigma = seq(0.1, 2, by = 0.3),
C = seq(0.1, 2, by = 0.5),
...
)
formula |
Model formula specifying the relationship between dependent and independent variables. |
data |
A data frame containing the variables specified in the formula. |
method |
Character. Specifies the machine learning method to use. Supported methods are random forest "rf", extreme gradient boosting "xgboost" and Support Vector Machine "svm". |
metric |
Character. The performance metric to optimize during tuning. Default is "RMSE". |
validation_method |
Character. Specifies the resampling method. Default is "cv". |
folds |
Integer. The number of folds for cross-validation during the tuning process. Default is 10. |
training_share |
Numeric. For leave-group out cross-validation: the training percentage. Default is 0.7. |
tune_length |
Integer. The number of parameter combinations to try during the tuning process. Default is 10. |
random_grid |
Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE. |
samples |
Integer. The number of random samples to take from the grid. Default is 30. |
poly |
Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE. |
degree |
Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3. |
interaction |
Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE. |
verboseIter |
Logical. If TRUE, the function will print the tuning process. Default is FALSE. |
include_explanatory |
Logical. If TRUE, given the condition Y || X | Z, the function will include explanatory variable X in the model for Y. Default is FALSE |
verbose |
Logical. If TRUE, the function will print the tuning process. Default is FALSE.. |
parallel |
Logical. If TRUE, the function will use parallel processing. Default is TRUE. |
mtry |
Integer. The number of variables randomly sampled as candidates at each split for random forest. Default is 1:5. |
nrounds |
Integer. The number of rounds (trees) for methods such as xgboost and random forest. Default is seq(50, 200, by = 25). |
eta |
Numeric. The learning rate for xgboost. Default is seq(0.01, 0.3, by = 0.05). |
max_depth |
Integer. The maximum depth of the tree for xgboost. Default is 1:6. |
gamma |
Numeric. The minimum loss reduction required to make a further partition on a leaf node for xgboost. Default is seq(0, 5, by = 1). |
colsample_bytree |
Numeric. The subsample ratio of columns when constructing each tree for xgboost. Default is seq(0.5, 1, by = 0.1). |
min_child_weight |
Integer. The minimum sum of instance weight (hessian) needed in a child for xgboost. Default is 1:5. |
subsample |
Numeric. The subsample ratio of the training. Default is 1. |
sigma |
Numeric. The standard deviation of the Gaussian kernel for Gaussian Process Regression. Default is seq(0.1, 2, by = 0.3). |
C |
Numeric. The regularization parameter for Support Vector Machine. Default is seq(0.1, 2, by = 0.5). |
... |
Additional arguments to pass to the |
A list containing:
best_param
: A data frame with the best parameters.
tuning_result
: A data frame with all tested parameter combinations and their performance metrics.
warnings
: A character vector of warnings issued during tuning.
CCI.test perm.test
, print.summary.CCI
, plot.CCI
, QQplot
set.seed(123)
data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100), y = rnorm(100))
# Tune random forest parameters
result <- CCI.pretuner(formula = y ~ x1 | x2 + x3,
data = data,
samples = 5,
folds = 3,
method = "rf")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.