View source: R/entropy_trivar.R
entropy_trivar | R Documentation |
Computes trivariate entropies of all triples of (discrete) variables in a multivariate data set.
entropy_trivar(dat)
dat |
dataframe with rows as observations and columns as variables. Variables must all be observed or transformed categorical with finite range spaces. |
Trivariate entropies can be used to check for functional relationships and
stochastic independence between triples of variables.
The trivariate entropy H(X,Y,Z) of three discrete random variables X, Y and Z
is bounded according to
H(X,Y) <= H(X,Y,Z) <= H(X,Z) + H(Y,Z) - H(Z).
The increment between the trivariate entropy and its lower bound is equal to the expected conditional entropy.
Dataframe with the first three columns representing possible triples of variables (V1,V2,V3
)
and the fourth column gives trivariate entropies H(V1,V2,V3)
.
Termeh Shafie
Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63.
entropy_bivar
, prediction_power
# use internal data set
data(lawdata)
df.att <- lawdata[[4]]
# three steps of data editing:
# 1. categorize variables 'years' and 'age' based on
# approximately three equally size groups (values based on cdf)
# 2. make sure all outcomes start from the value 0 (optional)
# 3. remove variable 'senior' as it consists of only unique values (thus redundant)
df.att.ed <- data.frame(
status = df.att$status,
gender = df.att$gender,
office = df.att$office - 1,
years = ifelse(df.att$years <= 3, 0,
ifelse(df.att$years <= 13, 1, 2)
),
age = ifelse(df.att$age <= 35, 0,
ifelse(df.att$age <= 45, 1, 2)
),
practice = df.att$practice,
lawschool = df.att$lawschool - 1
)
# calculate trivariate entropies
H.triv <- entropy_trivar(df.att.ed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.