AQI: AQI Index

View source: R/AQI.R

AQIR Documentation

AQI Index

Description

A quantitative measure of dataset quality. The AQI Index score indicates the degree to which features are associated with the outcome in a dataset. (Synonyms of "feature" in this document: "independent variable," "factor," "predictor.")
For more information, please refer to the corresponding publication: Shi, J., Zhang, J. and Ge, Y. (2019), "An Association-Based Intrinsic Quality Index for Healthcare Dataset Ranking" <doi:10.1109/ICHI.2019.8904553>

Usage

AQI(data, alpha.ind = 0.2)

Arguments

data

data frame with variables as columns and observations as rows. The data must include at least one feature (a.k.a., independent variable, predictor, factor) and only one outcome variable (Y). The outcome variable MUST BE THE LAST COLUMN. Both the features and the outcome MUST be categorical or discrete. If variables are not naturally discrete, you may preprocess them using the 'autoBin.binary()' function in the same package.

alpha.ind

level of significance for the mutual information test of independence in step 2 (<doi:10.1109/ICHI.2019.8904553>). By default, 'alpha.ind = 0.2'.

Value

The AQI Index score.

Examples

## ---- Generate a toy dataset: "data" ----
n <- 10000
set.seed(1)
x1 <- rbinom(n, 3, 0.5) + 0.2
set.seed(2)
x2 <- rbinom(n, 2, 0.8) + 0.5
set.seed(3)
x3 <- rbinom(n, 5, 0.3)
set.seed(4)
error <- round(runif(n, min=-1, max=1))
y <- x1 + x3 + error
data <- data.frame(cbind(x1, x2, x3, y))
colnames(data) <- c("feature1", "feature2", "feature3", "Y")

## ---- Calculate the AQI score of "data" ----
AQI(data)

CASMI documentation built on April 3, 2025, 10:56 p.m.

Related to AQI in CASMI...