mappings | R Documentation |
mappings
determines the mappings between factor variables in a data-frame
mappings(data, na.rm = TRUE, all.vars = FALSE, plot = TRUE)
data |
A data-frame (or an object coercible to a data-frame) |
na.rm |
Logical value; if |
all.vars |
Logical value; if |
plot |
Logical value; if |
In preliminary data analysis prior to statistical modelling, it is often useful to investigate whether there are mappings between factor
variables in a data-frame in order to see if any of these factor variables are redundant (i.e., fully determined by other factor variables).
This function takes an input data-frame data
and examines whether there are any mappings between the factor variables. (Note that
the function will interpret all character variables as factors but will not interpret numeric or logical variables as factors.) The output
is a list showing the uniqueness of the binary relations between the factor variables (a logical matrix showing left-uniqueness in the binary
relations), the mappings between factor variables, the redundant and non-redundant factor variables, and the directed acyclic graph (DAG) of
these mappings (the last element requires the user to have the ggdag
package installed; it is omitted if the package is not installed).
If plot = TRUE
the function also returns a plot of the DAG (if ggdag
and ggplot2
packages are installed).
Note that the function also allows the user to examine mappings between all variables in the data-frame (i.e., not just the factor variables)
by setting all.vars = TRUE
. The output from this analysis should be interpreted with caution; one-to-one mappings between non-factor variables
are common (e.g., when two variables are continuous it is almost certain that they will be in a one-to-one mapping), and so the existence of a
mapping may not be indicative of variable redundancy.
Note on operation: If na.rm = FALSE
then the function analyses the mappings between the factors/variables without removing NA values. In
this case an NA
value is treated as a missing value that could be any outcome. Consequently, for purposes of determining whether there
is a mapping between the variables, an NA
value is treated as if it were every possible value. The mapping is falsified if there are at
least two identical values in the domain (which may include one or more NA
values) that map to different values in the codomain (which
may include one or more NA
values).
A list object of class 'mappings' giving information on the mappings between the variables
DATA <- data.frame( VAR1 = c(0,1,2,2,0,1,2,0,0,1), VAR2 = c('A','B','B','B','A','B','B','A','A','B'), VAR3 = 1:10, VAR4 = c('A','B','C','D','A','B','D','A','A','B'), VAR5 = c(1:5,1:5) ) # Apply mappings mappings(DATA, all.vars = TRUE, plot = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.