calc_normalized_entropy: Calculate Normalized Entropy for Categorical Variables

View source: R/calc_normalized_entropy.R

calc_normalized_entropyR Documentation

Calculate Normalized Entropy for Categorical Variables

Description

Computes the normalized entropy (uncertainty measure) for categorical variables, providing a standardized measure of dispersion or randomness in the data.

Usage

calc_normalized_entropy(x)

Arguments

x

A character vector or factor containing categorical data.

Details

The function:

  • Handles both character vectors and factors as input

  • Treats NA values as a separate category

  • Normalizes entropy to range (0,1) where:

    • 0 indicates complete certainty (one category dominates)

    • 1 indicates maximum uncertainty (equal distribution)

The calculation process:

  1. Computes category proportions

  2. Calculates raw entropy using Shannon's formula

  3. Normalizes by dividing by maximum possible entropy

Value

A numeric value between 0 and 1 representing the normalized entropy:

  • Values closer to 0 indicate less diversity/uncertainty

  • Values closer to 1 indicate more diversity/uncertainty

Examples

# Calculate entropy for a simple categorical vector
x <- c("A", "B", "B", "C", "C", "C", "D", "D", "D", "D")
calc_normalized_entropy(x)

# Handle missing values
y <- c("A", "B", NA, "C", "C", NA, "D", "D")
calc_normalized_entropy(y)

# Works with factors too
z <- factor(c("Low", "Med", "Med", "High", "High", "High"))
calc_normalized_entropy(z)


qtkit documentation built on April 4, 2025, 4:47 a.m.