cmp_all: Compare counts and labels of a list of dataframes

View source: R/cmp_all.R

cmp_allR Documentation

Compare counts and labels of a list of dataframes

Description

Compare counts and labels of a list of dataframes

Usage

cmp_all(
  l,
  id = "id",
  include_ids = FALSE,
  spec_diffs = c("nv", "cv", "vallab", "varlab"),
  include_diffs = TRUE,
  col_groups = c("spec", "index")
)

Arguments

l

List of dataframes.

id

name of the key variable in the dataframes.

include_ids

Logical denoting whether a list column ids should be included in the results. The ids in each list show at which values of id the variable var contains the value val1, val2, ...

spec_diffs

Character vector (defaults to c("ex", "varlab", "val", "vallab")) specifying the attributes that should be compared in the resulting dataframe. If include_diffs = TRUE, this adds columns prefixed by spec_diffs, and each suffixed by "_diff". Additionally a column any_diff is added denoting if any of the former is TRUE.

include_diffs

Logical denoting whether in the resulting dataframe, logical columns suffixes "_diff" should be included, indicating if there are differences between the dataframes in l regarding the specification types given in spec_diffs.

col_groups

String specifying the order of columns in the resulting dataframe. "index" groups columns by indices. "spec" groups columns by the specification types c("ex", "varlab", "val", "vallab")

Value

Dataframe consisting of columns var, val1, val2, ..., vallab1, vallab2, ..., varlab1, varlab2, ..., ex1, ex2, ..., and n, containing a comparison of the counts of variable values (and their respective value labels) of the dataframes in long format. The indices denote the dataframes in l. vals_diff & vallabs_diff are logical columns indicating if all values, variable / value labels are equal.

Examples

# load spss data
path <- system.file("examples", "iris.sav", package = "haven")
df1 <- haven::read_sav(path) %>%
  # add id column
  tibble::rownames_to_column("id")

# create a modified copy:
df2 <- df1
df2[1, "Species"] <- 2
# modify the value label of "setosa"
df2$Species <- haven::labelled(df2$Species,
                               labels = c(setosa_mod = 1, versicolor = 2, virginica = 3),
                               label = "Species")

# compare the dataframes counts:
cmp_all(list(df1, df2))
# compare the dataframes and only show the counts where values have changed:
cmp_all(list(df1, df2)) %>% dplyr::filter(nv1 != nv2)
# This results in the same rows:
cmp_all(list(df1, df2), spec_diffs = "nv") %>% dplyr::filter(any_diff)
# Or alternatively:
cmp_all(list(df1, df2)) %>% dplyr::filter(nv_diff)
# compare the dataframes and only show the counts where value labels have changed:
cmp_all(list(df1, df2)) %>% dplyr::filter(vallab1 != vallab2)

# Create another modified copy
df3 <- df2
df3[2, "Species"] <- 3
# modify the value label of "versicolor"
df2$Species <- haven::labelled(df2$Species,
                               labels = c(setosa = 1, versicolor_mod = 2, virginica = 3),
                               label = "Species_mod")

# compare the dataframes counts:
l <- list(df1, df2, df3)
cmp <- cmp_all(l)
cmp

# compare the dataframes and only show the counts where values have changed:
cmp %>% dplyr::filter(nv_diff)

# Show where either values or value labels differ:
cmp %>% dplyr::filter(nv_diff | vallab_diff)

urswilke/tablab documentation built on Oct. 17, 2022, 8:19 p.m.