knitr::opts_chunk$set(
  collapse = TRUE,
  # cache = TRUE,
  message = FALSE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(knitr)

tablab

Lifecycle: experimental CRAN status

The goal of tablab is to tabulate and compare variables' values and their labels of labelled dataframes.

Installation

Install the package via:

devtools::install_github("urswilke/tablab")

Introduction

The package tablab mainly consists of 2 families of functions:

The following attributes are tabulated / compared:

Examples

Simple example

Assume you have a dataframe of labelled variables:

library(tablab)
library(dplyr)
library(purrr)
set.seed(1)

x <- haven::labelled(
  sample(1:10), 
  label = "variable label", 
  labels = c("1lab" =1 , "2lab" = 2, "3lab" = 3)
)
y <- haven::labelled(
  LETTERS[sample(2:6, size = 10, replace = TRUE)], 
  label = "char_var_lab", 
  labels = c("Alab" ="A" , "Blab" = "B", "Clab" = "C")
)

df <- data.frame(id = 1:10, x, y)
df

tablab consists of functions to tabulate summaries of these labelled dataframes. See for instance the function:

tab_all(df)

The result shows for every variable var in the data, its numeric/character values nv/cv, their counts n, as well as the variable and value labels varlab and vallab.

More examples

path <- system.file("examples", "iris.sav", package = "haven")
df <- haven::read_sav(path)
df <- tibble(id = 1:10,
       sex = haven::labelled(c(2, 1, 2, 1, 1, 2, 2, 1, 2, 1),
                             label = "sex",
                             labels = c(MALES = 1, FEMALES = 2)),
       age = c(24, 23, 23, 41, 23, 39, 30, 18, 31, 48),
       marital = haven::labelled(c(1, 7, 2, 6, 4, 5, 3, 8, 4, 2),
                                 label = "marital status",
                                 labels = c("single" = 1,
                                            "steady relationship" = 2,
                                            "living with partner" = 3,
                                            "married first time" = 4,
                                            "remarried" = 5,
                                            "separated" = 6,
                                            "divorced" = 7,
                                            "widowed" = 8))
       )
df

Create modified copy:

df2 <- 
  df %>% 
  # Remove rows 9 & 10:
  slice(-c(9:10)) %>% 
  # Create Variable new_var equal to 1:
  mutate(new_var = 1)
# Change variable label:
attr(df2$new_var, "label")  <-  "new something"

Tabulate the variable labels for the dataframes in the list:

list(df, df2) %>% map(tab_varlabs)

Change value:

df2[1:2,"sex"] <- 3

Tabulate the count comparison:

list(df, df2) %>% cmp_cts()

More label and value modifications:

attr(df2$sex, "labels")  <-  c(male = 1, female = 2, `Third Gender` = 3)
# Remove variable label:
attr(df2$sex, "label")  <-  NULL
# Change variable "marital" to 9 in rows 3 & 4:
df2[3:4,"marital"] <- 9
# Add another value label for the code 9:
attr(df2$marital, "labels") <- c(attr(df2$marital, "labels"), c("other" = 9))

df2

Tabulate the variable labels for the dataframes in the list:

list(df, df2) %>% map(tab_vallabs)

Tabulate the comparison of count and label data and only show rows with differences:

list(df, df2) %>% 
  cmp_all() %>% 
  filter(any_diff)


urswilke/tablab documentation built on Oct. 17, 2022, 8:19 p.m.