VKL: Calculate Variable-wise Kullback-Leibler divergence
In bAIo-lab/Questools: multivariate analysis and visualization

Description Usage Arguments Details Value Author(s) Examples

Calculates variable-wise Kullback-Leibler divergence between the two groups of samples.

1	VKL(data, group1, group2, permute = 0, levels = NULL)

`data`	A numerical dataframe with no missing value.
`group1`	A vector of integers. Demonstrates the row indices of group 1.
`group2`	A vector of integers. Demonstrates the row indices of group 2.
`permute`	An integer indicating the number of permutations for permutation test. If 0 (the default) no permutation test will be carried out.
`levels`	An integer value indicating the maximum number of levels of a categorical variable. To be used to distinguish the categorical variable. Defaults to NULL because it is supposed that `data` has been preprocessed using `data_preproc` and the categorical variables are specified.

The function helps users to find out the variables with the most divergence between two groups with different states of one specific variable. For instance, within a dataset of health measurements, we are interested in finding the most important variables in occurring cardiovascular disease. The function is able to carry out the permutation test to calculate the p_value for each variable.

if permute = 0 returns a dataframe including sorted Kullback-Liebler (KL) divergence. if permute > 0 returns a dataframe including p.values and sorted KL divergence.

Elyas Heidari

data("NHANES")
## Using preprocessed data
data <- data_preproc(NHANES, levels = 15)
data$SEQN <- NULL
# Construct two groups of samples
g1 <- which(data$PAD590 == 1)
g2 <- which(data$PAD590 == 6)
# Set permute to calculate p.values
kl <- VKL(data, group1 = g1, group2 = g2, permute = 100, levels = NULL)

## Using raw data
kl <- VKL(NHANES, group1 = g1, group2 = g2, permute = 0, levels = 15)