CCI.test: Computational test for conditional independence based on ML...
In CCI: Computational Test for Conditional Independence

CCI.test

R Documentation

Computational test for conditional independence based on ML and Monte Carlo Cross Validation

Description

The CCI.test function performs a conditional independence test using a specified machine learning model or a custom model provided by the user. It calculates the test statistic, generates a null distribution via permutations, computes p-values, and optionally generates a plot of the null distribution with the observed test statistic. The 'CCI.test' function serves as a wrapper around the 'perm.test' function

Usage

CCI.test(
  formula = NULL,
  data,
  plot = TRUE,
  p = 0.5,
  nperm = 60,
  nrounds = 600,
  metric = "Auto",
  method = "rf",
  choose_direction = FALSE,
  print_result = TRUE,
  parametric = FALSE,
  poly = TRUE,
  degree = 3,
  subsample = 1,
  min_child_weight = 1,
  colsample_bytree = 1,
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  num_class = NULL,
  interaction = TRUE,
  metricfunc = NULL,
  mlfunc = NULL,
  tail = NA,
  tune = FALSE,
  samples = 35,
  folds = 5,
  tune_length = 10,
  seed = NA,
  random_grid = TRUE,
  nthread = 1,
  verbose = FALSE,
  progress = TRUE,
  ...
)

Arguments

`formula`	Model formula or a DAGitty object specifying the relationship between dependent and independent variables.
`data`	A data frame containing the variables specified in the formula.
`plot`	Logical, indicating if a plot of the null distribution with the test statistic should be generated. Default is TRUE.
`p`	Numeric. Proportion of data used for training the model. Default is 0.5.
`nperm`	Integer. The number of permutations to perform. Default is 600.
`nrounds`	Integer. The number of rounds (trees) for methods 'xgboost' and 'rf' Default is 600.
`metric`	Character. Specifies the type of data: "Auto", "RMSE" or "Kappa". Default is "Auto".
`method`	Character. Specifies the machine learning method to use. Supported methods include generlaized linear models "lm", random forest "rf", and extreme gradient boosting "xgboost", etc. Default is "rf".#'
`choose_direction`	Logical. If TRUE, the function will choose the best direction for testing. Default is FALSE.
`print_result`	Logical. If TRUE, the function will print the result of the test. Default is TRUE.
`parametric`	Logical, indicating whether to compute a parametric p-value instead of the empirical p-value. A parametric p-value assumes that the null distribution is gaussian. Default is FALSE.
`poly`	Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE.
`degree`	Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3.
`subsample`	Numeric. The proportion of data to use for subsampling. Default is 1 (no subsampling).
`min_child_weight`	Numeric. The minimum sum of instance weight (hessian) needed in a child for methods like xgboost. Default is 1.
`colsample_bytree`	Numeric. The subsample ratio of columns when constructing each tree for methods like xgboost. Default is 1.
`eta`	Numeric. The learning rate for methods like xgboost. Default is 0.3.
`gamma`	Numeric. The minimum loss reduction required to make a further partition on a leaf node of the tree for methods like xgboost. Default is 0.
`max_depth`	Integer. The maximum depth of the trees for methods like xgboost. Default is 6.
`num_class`	Integer. The number of classes for categorical data (used in xgboost). Default is NULL.
`interaction`	Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE.
`metricfunc`	Optional the user can pass a custom function for calculating a performance metric based on the model's predictions. Default is NULL.
`mlfunc`	Optional the user can pass a custom machine learning wrapper function to use instead of the predefined methods. Default is NULL.
`tail`	Character. Specifies whether to calculate left-tailed or right-tailed p-values, depending on the performance metric used. Only applicable if using `metricfunc` or `mlfunc`. Default is NA.
`tune`	Logical. If TRUE, the function will perform hyperparameter tuning for the specified machine learning method. Default is FALSE.
`samples`	Integer. The number of samples to use for tuning. Default is 35.
`folds`	Integer. The number of folds for cross-validation during the tuning process. Default is 5.
`tune_length`	Integer. The number of parameter combinations to try during the tuning process. Default is 10.
`seed`	Integer. The seed for tuning. Default is NA.
`random_grid`	Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE.
`nthread`	Integer. The number of threads to use for parallel processing. Default is 1.
`verbose`	Logical. If TRUE, additional information is printed during the execution of the function. Default is FALSE.
`progress`	Logical. If TRUE, a progress bar is displayed during the permutation process. Default is TRUE.
`...`	Additional arguments to pass to the `perm.test` function.

Value

Invisibly returns the result of perm.test, which is an object of class 'CCI' containing the null distribution, observed test statistic, p-values, the machine learning model used, and the data.

Examples

set.seed(123)
data <- data.frame(x1 = stats::rnorm(100), x2 = stats::rnorm(100), y = stats::rnorm(100))
result <- CCI.test(y ~ x1 | x2, data = data, nperm = 25, interaction = FALSE)
summary(result)

CCI documentation built on Aug. 29, 2025, 5:17 p.m.