CCI.test | R Documentation |
The CCI.test
function performs a conditional independence test using a specified machine learning model or a custom model provided by the user. It calculates the test statistic, generates a null distribution via permutations, computes p-values, and optionally generates a plot of the null distribution with the observed test statistic.
The 'CCI.test' function serves as a wrapper around the 'perm.test' function
CCI.test(
formula = NULL,
data,
plot = TRUE,
p = 0.5,
nperm = 60,
nrounds = 600,
metric = "Auto",
method = "rf",
choose_direction = FALSE,
print_result = TRUE,
parametric = FALSE,
poly = TRUE,
degree = 3,
subsample = 1,
min_child_weight = 1,
colsample_bytree = 1,
eta = 0.3,
gamma = 0,
max_depth = 6,
num_class = NULL,
interaction = TRUE,
metricfunc = NULL,
mlfunc = NULL,
tail = NA,
tune = FALSE,
samples = 35,
folds = 5,
tune_length = 10,
seed = NA,
random_grid = TRUE,
nthread = 1,
verbose = FALSE,
progress = TRUE,
...
)
formula |
Model formula or a DAGitty object specifying the relationship between dependent and independent variables. |
data |
A data frame containing the variables specified in the formula. |
plot |
Logical, indicating if a plot of the null distribution with the test statistic should be generated. Default is TRUE. |
p |
Numeric. Proportion of data used for training the model. Default is 0.5. |
nperm |
Integer. The number of permutations to perform. Default is 600. |
nrounds |
Integer. The number of rounds (trees) for methods 'xgboost' and 'rf' Default is 600. |
metric |
Character. Specifies the type of data: "Auto", "RMSE" or "Kappa". Default is "Auto". |
method |
Character. Specifies the machine learning method to use. Supported methods include generlaized linear models "lm", random forest "rf", and extreme gradient boosting "xgboost", etc. Default is "rf".#' |
choose_direction |
Logical. If TRUE, the function will choose the best direction for testing. Default is FALSE. |
print_result |
Logical. If TRUE, the function will print the result of the test. Default is TRUE. |
parametric |
Logical, indicating whether to compute a parametric p-value instead of the empirical p-value. A parametric p-value assumes that the null distribution is gaussian. Default is FALSE. |
poly |
Logical. If TRUE, polynomial terms of the conditional variables are included in the model. Default is TRUE. |
degree |
Integer. The degree of polynomial terms to include if poly is TRUE. Default is 3. |
subsample |
Numeric. The proportion of data to use for subsampling. Default is 1 (no subsampling). |
min_child_weight |
Numeric. The minimum sum of instance weight (hessian) needed in a child for methods like xgboost. Default is 1. |
colsample_bytree |
Numeric. The subsample ratio of columns when constructing each tree for methods like xgboost. Default is 1. |
eta |
Numeric. The learning rate for methods like xgboost. Default is 0.3. |
gamma |
Numeric. The minimum loss reduction required to make a further partition on a leaf node of the tree for methods like xgboost. Default is 0. |
max_depth |
Integer. The maximum depth of the trees for methods like xgboost. Default is 6. |
num_class |
Integer. The number of classes for categorical data (used in xgboost). Default is NULL. |
interaction |
Logical. If TRUE, interaction terms of the conditional variables are included in the model. Default is TRUE. |
metricfunc |
Optional the user can pass a custom function for calculating a performance metric based on the model's predictions. Default is NULL. |
mlfunc |
Optional the user can pass a custom machine learning wrapper function to use instead of the predefined methods. Default is NULL. |
tail |
Character. Specifies whether to calculate left-tailed or right-tailed p-values, depending on the performance metric used. Only applicable if using |
tune |
Logical. If TRUE, the function will perform hyperparameter tuning for the specified machine learning method. Default is FALSE. |
samples |
Integer. The number of samples to use for tuning. Default is 35. |
folds |
Integer. The number of folds for cross-validation during the tuning process. Default is 5. |
tune_length |
Integer. The number of parameter combinations to try during the tuning process. Default is 10. |
seed |
Integer. The seed for tuning. Default is NA. |
random_grid |
Logical. If TRUE, a random grid search is performed. If FALSE, a full grid search is performed. Default is TRUE. |
nthread |
Integer. The number of threads to use for parallel processing. Default is 1. |
verbose |
Logical. If TRUE, additional information is printed during the execution of the function. Default is FALSE. |
progress |
Logical. If TRUE, a progress bar is displayed during the permutation process. Default is TRUE. |
... |
Additional arguments to pass to the |
Invisibly returns the result of perm.test
, which is an object of class 'CCI' containing the null distribution, observed test statistic, p-values, the machine learning model used, and the data.
perm.test
, print.summary.CCI
, plot.CCI
, CCI.pretuner
, QQplot
set.seed(123)
data <- data.frame(x1 = stats::rnorm(100), x2 = stats::rnorm(100), y = stats::rnorm(100))
result <- CCI.test(y ~ x1 | x2, data = data, nperm = 25, interaction = FALSE)
summary(result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.