CCI.test: Computational test for conditional independence based on ML...

View source: R/CCI.test.R

CCI.testR Documentation

Computational test for conditional independence based on ML and Monte Carlo Cross Validation

Description

The CCI.test function performs a conditional independence test using a specified machine learning model or a custom model provided by the user. It calculates the test statistic, generates a null distribution via permutations, computes p-values, and optionally generates a plot of the null distribution with the observed test statistic. The 'CCI.test' function serves as a wrapper around the 'perm.test' function

Usage

CCI.test(
  formula = NULL,
  data,
  plot = TRUE,
  p = 0.5,
  nperm = 60,
  nrounds = 600,
  metric = "Auto",
  method = "rf",
  choose_direction = FALSE,
  print_result = TRUE,
  parametric = FALSE,
  poly = TRUE,
  degree = 3,
  subsample = 1,
  min_child_weight = 1,
  colsample_bytree = 1,
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  num_class = NULL,
  interaction = TRUE,
  metricfunc = NULL,
  mlfunc = NULL,
  tail = NA,
  tune = FALSE,
  samples = 35,
  folds = 5,
  tune_length = 10,
  seed = NA,
  random_grid = TRUE,
  nthread = 1,
  verbose = FALSE,
  progress = TRUE,
  ...
)

Arguments

formula

Model formula or a DAGitty object specifying the relationship between dependent and independent variables.

data

A data frame containing the variables specified in the formula.

plot

Logical, indicating if a plot of the null distribution with the test statistic should be generated. Default is TRUE.

p

Numeric. Proportion of data used for training the model. Default is 0.5.

nperm

Integer. The number of permutations to perform. Default is 600.

nrounds

Integer. The number of rounds (trees) for methods 'xgboost' and 'rf' Default is 600.

metric

Character. Specifies the type of data: "Auto", "RMSE" or "Kappa". Default is "Auto".

method

Character. Specifies the machine learning method to use. Supported methods include generlaized linear models "lm", random forest "rf", and extreme gradient boosting "xgboost", etc. Default is "rf".#'

choose_direction

Logical. If TRUE, the function will choose the best direction for testing. Default is FALSE.

print_result

Logical. If TRUE, the function will print the result of the test. Default is TRUE.

parametric

Logical, indicating whether to compute a parametric p-value instead of the empirical p-value. A parametric p-value assumes that the null distribution is gaussian. Default is FALSE.

poly

Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE.

degree

Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3.

subsample

Numeric. The proportion of data to use for subsampling. Default is 1 (no subsampling).

min_child_weight

Numeric. The minimum sum of instance weight (hessian) needed in a child for methods like xgboost. Default is 1.

colsample_bytree

Numeric. The subsample ratio of columns when constructing each tree for methods like xgboost. Default is 1.

eta

Numeric. The learning rate for methods like xgboost. Default is 0.3.

gamma

Numeric. The minimum loss reduction required to make a further partition on a leaf node of the tree for methods like xgboost. Default is 0.

max_depth

Integer. The maximum depth of the trees for methods like xgboost. Default is 6.

num_class

Integer. The number of classes for categorical data (used in xgboost). Default is NULL.

interaction

Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE.

metricfunc

Optional the user can pass a custom function for calculating a performance metric based on the model's predictions. Default is NULL.

mlfunc

Optional the user can pass a custom machine learning wrapper function to use instead of the predefined methods. Default is NULL.

tail

Character. Specifies whether to calculate left-tailed or right-tailed p-values, depending on the performance metric used. Only applicable if using metricfunc or mlfunc. Default is NA.

tune

Logical. If TRUE, the function will perform hyperparameter tuning for the specified machine learning method. Default is FALSE.

samples

Integer. The number of samples to use for tuning. Default is 35.

folds

Integer. The number of folds for cross-validation during the tuning process. Default is 5.

tune_length

Integer. The number of parameter combinations to try during the tuning process. Default is 10.

seed

Integer. The seed for tuning. Default is NA.

random_grid

Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE.

nthread

Integer. The number of threads to use for parallel processing. Default is 1.

verbose

Logical. If TRUE, additional information is printed during the execution of the function. Default is FALSE.

progress

Logical. If TRUE, a progress bar is displayed during the permutation process. Default is TRUE.

...

Additional arguments to pass to the perm.test function.

Value

Invisibly returns the result of perm.test, which is an object of class 'CCI' containing the null distribution, observed test statistic, p-values, the machine learning model used, and the data.

See Also

perm.test, print.summary.CCI, plot.CCI, CCI.pretuner, QQplot

Examples

set.seed(123)
data <- data.frame(x1 = stats::rnorm(100), x2 = stats::rnorm(100), y = stats::rnorm(100))
result <- CCI.test(y ~ x1 | x2, data = data, nperm = 25, interaction = FALSE)
summary(result)

CCI documentation built on Aug. 29, 2025, 5:17 p.m.