prediction_power: Prediction Power

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/prediction_power.R

Description

Computes prediction power when pairs of variables in a given dataframe are used to predict a third variable from the same dataframe. The prediction strength is measured by expected conditional entropies.

Usage

1

Arguments

var

character string representing the variable in dataframe dat to be predicted by pairs of other variables in the dataframe dat.

dat

dataframe with rows as observations and columns as variables. Variables must all be observed or transformed categorical with finite range spaces.

Details

The expected conditional entropy given by

EH(Z|X,Y) = H(X,Y,Z) - H(X, Y)

measures the prediction uncertainty when pairs of variables X and Y are used to predict variable Z. The lower the value of EH given different pairs of variables X and Y, the stronger is the prediction of Z.

Value

Upper triangular matrix giving the expected conditional entropies of pairs of variables given as rows and columns of the matrix. The diagonal gives EH(Z|X) = H(X,Z) - H(X), that is when only one variable is used to predict var. Note that NA's are in the entire row and column representing the variable being predicted.

Author(s)

Termeh Shafie

References

Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63.

Nowicki, K., Shafie, T., & Frank, O. (Forthcoming 2022). Statistical Entropy Analysis of Network Data.

See Also

entropy_trivar, entropy_bivar

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# use internal data set
data(lawdata)
df.att <- lawdata[[4]]

# three steps of data editing:
# 1. categorize variables 'years' and 'age' based on
# approximately three equally size groups (values based on cdf)
# 2. make sure all outcomes start from the value 0 (optional)
# 3. remove variable 'senior' as it consists of only unique values (thus redundant)
df.att.ed <- data.frame(
   status   = df.att$status,
   gender   = df.att$gender,
   office   = df.att$office-1,
   years    = ifelse(df.att$years<=3,0,
              ifelse(df.att$years<=13,1,2)),
   age      = ifelse(df.att$age<=35,0,
                ifelse(df.att$age<=45,1,2)),
   practice = df.att$practice,
   lawschool= df.att$lawschool-1)

# power of predicting 'status' using pairs of other variables
prediction_power('status', df.att.ed)

netropy documentation built on Feb. 2, 2022, 9:07 a.m.