knitr::opts_chunk$set( collapse = TRUE, # cache = TRUE, message = FALSE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) library(knitr)
The goal of tablab is to tabulate and compare variables' values and their labels of labelled dataframes.
Install the package via:
devtools::install_github("urswilke/tablab")
The package tablab mainly consists of 2 families of functions:
tab_*
tabulating count and label information of dataframes (denoted df
), andcmp_*
comparing this data for all dataframes in a list l
.The following attributes are tabulated / compared:
var
The name of the variables in the dataframe(s).type
The type of the variables in the dataframe(s).varlab
The variable label of the variable in the dataframe. val
The value of the variable in the dataframe. vallab
The value label of the variable in the dataframe. Assume you have a dataframe of labelled variables:
library(tablab) library(dplyr) library(purrr) set.seed(1) x <- haven::labelled( sample(1:10), label = "variable label", labels = c("1lab" =1 , "2lab" = 2, "3lab" = 3) ) y <- haven::labelled( LETTERS[sample(2:6, size = 10, replace = TRUE)], label = "char_var_lab", labels = c("Alab" ="A" , "Blab" = "B", "Clab" = "C") ) df <- data.frame(id = 1:10, x, y) df
tablab consists of functions to tabulate summaries of these labelled dataframes. See for instance the function:
tab_all(df)
The result shows for every variable var
in the data, its numeric/character values nv
/cv
, their counts n
, as well as the variable and value labels varlab
and vallab
.
path <- system.file("examples", "iris.sav", package = "haven") df <- haven::read_sav(path) df <- tibble(id = 1:10, sex = haven::labelled(c(2, 1, 2, 1, 1, 2, 2, 1, 2, 1), label = "sex", labels = c(MALES = 1, FEMALES = 2)), age = c(24, 23, 23, 41, 23, 39, 30, 18, 31, 48), marital = haven::labelled(c(1, 7, 2, 6, 4, 5, 3, 8, 4, 2), label = "marital status", labels = c("single" = 1, "steady relationship" = 2, "living with partner" = 3, "married first time" = 4, "remarried" = 5, "separated" = 6, "divorced" = 7, "widowed" = 8)) ) df
Create modified copy:
df2 <- df %>% # Remove rows 9 & 10: slice(-c(9:10)) %>% # Create Variable new_var equal to 1: mutate(new_var = 1) # Change variable label: attr(df2$new_var, "label") <- "new something"
Tabulate the variable labels for the dataframes in the list:
list(df, df2) %>% map(tab_varlabs)
Change value:
df2[1:2,"sex"] <- 3
Tabulate the count comparison:
list(df, df2) %>% cmp_cts()
More label and value modifications:
attr(df2$sex, "labels") <- c(male = 1, female = 2, `Third Gender` = 3) # Remove variable label: attr(df2$sex, "label") <- NULL # Change variable "marital" to 9 in rows 3 & 4: df2[3:4,"marital"] <- 9 # Add another value label for the code 9: attr(df2$marital, "labels") <- c(attr(df2$marital, "labels"), c("other" = 9)) df2
Tabulate the variable labels for the dataframes in the list:
list(df, df2) %>% map(tab_vallabs)
Tabulate the comparison of count and label data and only show rows with differences:
list(df, df2) %>% cmp_all() %>% filter(any_diff)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.