performance_bin: Diagnose Performance Binned Variable

performance_binR Documentation

Diagnose Performance Binned Variable

Description

The performance_bin() calculates metrics to evaluate the performance of binned variable for binomial classification model.

Usage

performance_bin(y, x, na.rm = FALSE)

Arguments

y

character or numeric, integer, factor. a binary response variable (0, 1). The variable must contain only the integers 0 and 1 as element. However, in the case of factor/character having two levels, it is performed while type conversion is performed in the calculation process.

x

integer or factor, character. At least 2 different values. and Inf is not allowed.

na.rm

logical. a logical indicating whether missing values should be removed.

Details

This function is useful when used with the mutate/transmute function of the dplyr package.

Value

an object of "performance_bin" class. vaue of data.frame is as follows.

  • Bin : character. bins.

  • CntRec : integer. frequency by bins.

  • CntPos : integer. frequency of positive by bins.

  • CntNeg : integer. frequency of negative by bins.

  • CntCumPos : integer. cumulate frequency of positive by bins.

  • CntCumNeg : integer. cumulate frequency of negative by bins.

  • RatePos : integer. relative frequency of positive by bins.

  • RateNeg : integer. relative frequency of negative by bins.

  • RateCumPos : numeric. cumulate relative frequency of positive by bins.

  • RateCumNeg : numeric. cumulate relative frequency of negative by bins.

  • Odds : numeric. odd ratio.

  • LnOdds : numeric. loged odd ratio.

  • WoE : numeric. weight of evidence.

  • IV : numeric. Jeffrey's Information Value.

  • JSD : numeric. Jensen-Shannon Divergence.

  • AUC : numeric. AUC. area under curve.

Attributes of "performance_bin" class is as follows.

  • names : character. variable name of data.frame with "Binning Table".

  • class : character. name of class. "performance_bin" "data.frame".

  • row.names : character. row name of data.frame with "Binning Table".

  • IV : numeric. Jeffrey's Information Value.

  • JSD : numeric. Jensen-Shannon Divergence.

  • KS : numeric. Kolmogorov-Smirnov Statistics.

  • gini : numeric. Gini index.

  • HHI : numeric. Herfindahl-Hirschman Index.

  • HHI_norm : numeric.normalized Herfindahl-Hirschman Index.

  • Cramer_V : numeric. Cramer's V Statistics.

  • chisq_test : data.frame. table of significance tests. name is as follows.

    • Bin A : character. first bins.

    • Bin B : character. second bins.

    • statistics : numeric. statistics of Chi-square test.

    • p_value : numeric. p-value of Chi-square test.

See Also

summary.performance_bin, plot.performance_bin, binning_by.

Examples


# Generate data for the example
heartfailure2 <- heartfailure

set.seed(123)
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA

# Change the target variable to 0(negative) and 1(positive).
heartfailure2$death_event_2 <- ifelse(heartfailure2$death_event %in% "Yes", 1, 0)

# Binnig from creatinine to platelets_bin.
breaks <- c(0,  1,  2, 10)
heartfailure2$creatinine_bin <- cut(heartfailure2$creatinine, breaks)

# Diagnose performance binned variable
perf <- performance_bin(heartfailure2$death_event_2, heartfailure2$creatinine_bin) 
perf
summary(perf)

plot(perf)

# Diagnose performance binned variable without NA
perf <- performance_bin(heartfailure2$death_event_2, heartfailure2$creatinine_bin, na.rm = TRUE) 
perf
summary(perf)

plot(perf)



dlookr documentation built on May 29, 2024, 2 a.m.