gini_impurities: Gini Impurities

Description Usage Arguments Details Examples

Description

Identify group weighted gini impurities using pairs of columns within a dataset. Can be used to located hierarchical data, or 1-1 correspondences

Usage

1
gini_impurities(dt, wide = FALSE, verbose = FALSE)

Arguments

dt

A data.table with at least two columns

wide

Should the results be in wide format?

verbose

Should progress be printed to the screen?

Details

For pairs of columns (Var1, Var2) in a dataset, calculates the weighted gini impurity of Var2 relative to the groups determined by Var1

Examples

1
2
3

Example output

         Var1      Var2 GiniImpurity
 1: SkinColor SkinColor       0.0000
 2: SkinColor      Cat1       0.2500
 3: SkinColor      Cat2       0.4375
 4: SkinColor      Cat3       0.2500
 5: SkinColor   IsAlien       0.1875
 6:      Cat1 SkinColor       0.3750
 7:      Cat1      Cat1       0.0000
 8:      Cat1      Cat2       0.3750
 9:      Cat1      Cat3       0.0000
10:      Cat1   IsAlien       0.5000
11:      Cat2 SkinColor       0.2500
12:      Cat2      Cat1       0.0000
13:      Cat2      Cat2       0.0000
14:      Cat2      Cat3       0.0000
15:      Cat2   IsAlien       0.3750
16:      Cat3 SkinColor       0.3750
17:      Cat3      Cat1       0.0000
18:      Cat3      Cat2       0.3750
19:      Cat3      Cat3       0.0000
20:      Cat3   IsAlien       0.5000
21:   IsAlien SkinColor       0.5000
22:   IsAlien      Cat1       0.6250
23:   IsAlien      Cat2       0.7500
24:   IsAlien      Cat3       0.6250
25:   IsAlien   IsAlien       0.0000
         Var1      Var2 GiniImpurity
        Var1  Cat1   Cat2  Cat3 IsAlien SkinColor
1:      Cat1 0.000 0.3750 0.000  0.5000     0.375
2:      Cat2 0.000 0.0000 0.000  0.3750     0.250
3:      Cat3 0.000 0.3750 0.000  0.5000     0.375
4:   IsAlien 0.625 0.7500 0.625  0.0000     0.500
5: SkinColor 0.250 0.4375 0.250  0.1875     0.000

mltools documentation built on May 2, 2019, 5:22 a.m.