test.gen: Generate the Test Statistic or Null Distribution Using...

View source: R/test.gen.R

test.genR Documentation

Generate the Test Statistic or Null Distribution Using Permutation

Description

This function generates the test statistic or a null distribution through permutation for conditional independence testing. It supports various machine learning methods, including random forests, extreme gradient boosting, and allows for custom metric functions and model fitting functions.

Usage

test.gen(
  formula,
  data,
  method = "rf",
  metric,
  nperm = 60,
  subsample = 1,
  p = 0.8,
  poly = TRUE,
  interaction = TRUE,
  degree = 3,
  nrounds = 600,
  nthread = 1,
  permutation = FALSE,
  metricfunc = NULL,
  mlfunc = NULL,
  num_class = NULL,
  progress = TRUE,
  ...
)

Arguments

formula

Formula specifying the relationship between dependent and independent variables.

data

Data frame. The data containing the variables used.

method

Character. The modeling method to be used. Options include "xgboost" for gradient boosting, or "rf" for random forests or '"svm" for Support Vector Machine.

metric

Character. The type of metric: can be "RMSE", "Kappa" or "Custom. Default is 'RMSE'

nperm

Integer. The number of generated Monte Carlo samples. Default is 60.

subsample

Numeric. The proportion of the data to be used for subsampling. Default is 1 (no subsampling).

p

Numeric. The proportion of the data to be used for training. The remaining data will be used for testing. Default is 0.8.

poly

Logical. Whether to include polynomial terms of the conditioning variables. Default is TRUE.

interaction

Logical. Whether to include interaction terms of the conditioning variables. Default is TRUE.

degree

Integer. The degree of polynomial terms to be included if poly is TRUE. Default is 3.

nrounds

Integer. The number of rounds (trees) for methods like xgboost, ranger, and lightgbm. Default is 500.

nthread

Integer. The number of threads to use for parallel processing. Default is 1.

permutation

Logical. Whether to perform permutation to generate a null distribution. Default is FALSE.

metricfunc

Function. A custom metric function provided by the user. The function must take arguments: data, model, test_indices, and test_matrix, and return a single value performance metric. Default is NULL.

mlfunc

Function. A custom machine learning function provided by the user. The function must have the arguments: formula, data, train_indices, test_indices, and ..., and return a single value performance metric. Default is NULL.

num_class

Integer. The number of classes for categorical data (used in xgboost and lightgbm). Default is NULL.

progress

Function. A logical value indicating whether to show a progress bar during the permutation process. Default is TRUE.

...

Additional arguments to pass to the machine learning wrapper functions xgboost_wrapper, ranger_wrapper, lightgbm_wrapper, or to a custom-built wrapper function.

Value

A list containing the test distribution.

Examples

set.seed(123)
data <- data.frame(x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100),
x4 = rnorm(100),
y = rnorm(100))
result <- test.gen(formula = y ~ x1 | x2 + x3 + x4,
                   metric = "RMSE",
                   data = data)
hist(result$distribution)

CCI documentation built on Aug. 29, 2025, 5:17 p.m.