redundancy: Redundant Variables & Dimensionality Reduction

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/redundancy.R

Description

Finds redundant variables in a dataframe consisting of discrete variables.

Usage

1
redundancy(dat, dec = 3)

Arguments

dat

dataframe with rows as observations and columns as variables. Variables must all be observed or transformed categorical with finite range spaces.

dec

the precision given as number of decimals used to round bivariate entropies in order to find redundant variables (the more decimals, the harder to detect redundancy). Default is 3.

Details

Redundancy is defined as two variables holding the same information (bivariate entropies) as at least one of the variable alone (univariate entropies). Consider removing one of these two variable from the dataframe for further analysis.

Value

Binary matrix indicating which row and column variables hold the same information.

Author(s)

Termeh Shafie

References

Frank, O., & Shafie, T. (2016). Multivariate entropy analysis of network data. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 129(1), 45-63.

Nowicki, K., Shafie, T., & Frank, O. (Forthcoming 2022). Statistical Entropy Analysis of Network Data.

See Also

entropy_bivar,

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# use internal data set
data(lawdata)
df.att <- lawdata[[4]]

# two steps of data editing:
# 1. categorize variables 'years' and 'age' based on
# approximately three equally size groups (values based on cdf)
# 2. make sure all outcomes start from the value 0 (optional)
df.att.ed <- data.frame(
   senior   = df.att$senior,
   status   = df.att$status,
   gender   = df.att$gender,
   office   = df.att$office-1,
   years    = ifelse(df.att$years<=3,0,
              ifelse(df.att$years<=13,1,2)),
   age      = ifelse(df.att$age<=35,0,
                ifelse(df.att$age<=45,1,2)),
   practice = df.att$practice,
   lawschool= df.att$lawschool-1)

# find redundant variables in dataframe
redundancy(df.att.ed) # variable 'senior' should be omitted

netropy documentation built on Feb. 2, 2022, 9:07 a.m.