mappings: Examine mappings between factor variables in a data-frame
In utilities: Data Utility Functions

mappings

R Documentation

Examine mappings between factor variables in a data-frame

Description

mappings determines the mappings between factor variables in a data-frame

Usage

mappings(data, na.rm = TRUE, all.vars = FALSE, plot = TRUE)

Arguments

`data`	A data-frame (or an object coercible to a data-frame)
`na.rm`	Logical value; if `TRUE` the function removes `NA` values from consideration
`all.vars`	Logical value; if `TRUE` the function only examines factor variables in the data-frame; if `FALSE` the function examines all variables in the data-frame (caution is required in interpretation of output)
`plot`	Logical value; if `TRUE` the function plots the DAG for the mappings (requires `ggplot2` and `ggdag` to work)

Details

In preliminary data analysis prior to statistical modelling, it is often useful to investigate whether there are mappings between factor variables in a data-frame in order to see if any of these factor variables are redundant (i.e., fully determined by other factor variables). This function takes an input data-frame data and examines whether there are any mappings between the factor variables. (Note that the function will interpret all character variables as factors but will not interpret numeric or logical variables as factors.) The output is a list showing the uniqueness of the binary relations between the factor variables (a logical matrix showing left-uniqueness in the binary relations), the mappings between factor variables, the redundant and non-redundant factor variables, and the directed acyclic graph (DAG) of these mappings (the last element requires the user to have the ggdag package installed; it is omitted if the package is not installed). If plot = TRUE the function also returns a plot of the DAG (if ggdag and ggplot2 packages are installed).

Note that the function also allows the user to examine mappings between all variables in the data-frame (i.e., not just the factor variables) by setting all.vars = TRUE. The output from this analysis should be interpreted with caution; one-to-one mappings between non-factor variables are common (e.g., when two variables are continuous it is almost certain that they will be in a one-to-one mapping), and so the existence of a mapping may not be indicative of variable redundancy.

Note on operation: If na.rm = FALSE then the function analyses the mappings between the factors/variables without removing NA values. In this case an NA value is treated as a missing value that could be any outcome. Consequently, for purposes of determining whether there is a mapping between the variables, an NA value is treated as if it were every possible value. The mapping is falsified if there are at least two identical values in the domain (which may include one or more NA values) that map to different values in the codomain (which may include one or more NA values).

Value

A list object of class 'mappings' giving information on the mappings between the variables

Examples


DATA <- data.frame(
  VAR1 = c(0,1,2,2,0,1,2,0,0,1),
  VAR2 = c('A','B','B','B','A','B','B','A','A','B'),
  VAR3 = 1:10,
  VAR4 = c('A','B','C','D','A','B','D','A','A','B'),
  VAR5 = c(1:5,1:5)
)

# Apply mappings
mappings(DATA, all.vars = TRUE, plot = FALSE)

utilities documentation built on July 1, 2022, 9:06 a.m.