CCI.pretuner: CCI tuner function for CCI test

View source: R/CCI.pretuner.R

CCI.pretunerR Documentation

CCI tuner function for CCI test

Description

The CCI.tuner function performs a grid search over parameters for a conditional independence test using machine learning model supported by CCI.test. The tuner use the caret package for tuning.

Usage

CCI.pretuner(
  formula,
  data,
  method = "rf",
  metric = "RMSE",
  validation_method = "cv",
  folds = 4,
  training_share = 0.7,
  tune_length = 4,
  random_grid = TRUE,
  samples = 35,
  poly = TRUE,
  degree = 3,
  interaction = TRUE,
  verboseIter = FALSE,
  include_explanatory = FALSE,
  verbose = FALSE,
  parallel = FALSE,
  mtry = 1:10,
  nrounds = c(100, 200, 300, 400, 500, 600, 700, 800, 900, 1000),
  eta = seq(0.01, 0.3, by = 0.05),
  max_depth = 2:6,
  gamma = c(0, 1, 2, 3),
  colsample_bytree = c(0.8, 0.9, 1),
  min_child_weight = c(1, 3),
  subsample = 1,
  sigma = seq(0.1, 2, by = 0.3),
  C = seq(0.1, 2, by = 0.5),
  ...
)

Arguments

formula

Model formula specifying the relationship between dependent and independent variables.

data

A data frame containing the variables specified in the formula.

method

Character. Specifies the machine learning method to use. Supported methods are random forest "rf", extreme gradient boosting "xgboost" and Support Vector Machine "svm".

metric

Character. The performance metric to optimize during tuning. Default is "RMSE".

validation_method

Character. Specifies the resampling method. Default is "cv".

folds

Integer. The number of folds for cross-validation during the tuning process. Default is 10.

training_share

Numeric. For leave-group out cross-validation: the training percentage. Default is 0.7.

tune_length

Integer. The number of parameter combinations to try during the tuning process. Default is 10.

random_grid

Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE.

samples

Integer. The number of random samples to take from the grid. Default is 30.

poly

Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE.

degree

Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3.

interaction

Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE.

verboseIter

Logical. If TRUE, the function will print the tuning process. Default is FALSE.

include_explanatory

Logical. If TRUE, given the condition Y || X | Z, the function will include explanatory variable X in the model for Y. Default is FALSE

verbose

Logical. If TRUE, the function will print the tuning process. Default is FALSE..

parallel

Logical. If TRUE, the function will use parallel processing. Default is TRUE.

mtry

Integer. The number of variables randomly sampled as candidates at each split for random forest. Default is 1:5.

nrounds

Integer. The number of rounds (trees) for methods such as xgboost and random forest. Default is seq(50, 200, by = 25).

eta

Numeric. The learning rate for xgboost. Default is seq(0.01, 0.3, by = 0.05).

max_depth

Integer. The maximum depth of the tree for xgboost. Default is 1:6.

gamma

Numeric. The minimum loss reduction required to make a further partition on a leaf node for xgboost. Default is seq(0, 5, by = 1).

colsample_bytree

Numeric. The subsample ratio of columns when constructing each tree for xgboost. Default is seq(0.5, 1, by = 0.1).

min_child_weight

Integer. The minimum sum of instance weight (hessian) needed in a child for xgboost. Default is 1:5.

subsample

Numeric. The subsample ratio of the training. Default is 1.

sigma

Numeric. The standard deviation of the Gaussian kernel for Gaussian Process Regression. Default is seq(0.1, 2, by = 0.3).

C

Numeric. The regularization parameter for Support Vector Machine. Default is seq(0.1, 2, by = 0.5).

...

Additional arguments to pass to the CCI.tuner function.

Value

A list containing:

  • best_param: A data frame with the best parameters.

  • tuning_result: A data frame with all tested parameter combinations and their performance metrics.

  • warnings: A character vector of warnings issued during tuning.

See Also

CCI.test perm.test, print.summary.CCI, plot.CCI, QQplot

Examples

set.seed(123)
data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100), y = rnorm(100))
# Tune random forest parameters
result <- CCI.pretuner(formula = y ~ x1 | x2 + x3,
data = data,
samples = 5,
folds = 3,
method = "rf")

CCI documentation built on Aug. 29, 2025, 5:17 p.m.