Datasets for categorical data analysis"

knitr::opts_chunk$set(
  collapse = TRUE,
  message = FALSE,
  warning = FALSE,
  fig.height = 6,
  fig.width = 7,
  fig.path = "fig/datasets-",
  dev = "png",
  comment = "##"
)

# save some typing
knitr::set_alias(w = "fig.width",
                 h = "fig.height",
                 cap = "fig.cap")

The vcdExtra package contains r nrow(vcdExtra::datasets("vcdExtra")) datasets, taken from the literature on categorical data analysis, and selected to illustrate various methods of analysis and data display. These are in addition to the r nrow(vcdExtra::datasets("vcd")) datasets in the vcd package.

To make it easier to find those which illustrate a particular method, the datasets in vcdExtra have been classified using method tags. This vignette creates an "inverse table", listing the datasets that apply to each method. It also illustrates a general method for classifying datasets in R packages.

library(dplyr)
library(tidyr)
library(readxl)

Processing tags

Using the result of vcdExtra::datasets(package="vcdExtra") I created a spreadsheet, vcdExtra-datasets.xlsx, and then added method tags.

dsets_tagged <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"), 
                           sheet="vcdExtra-datasets")

dsets_tagged <- dsets_tagged |>
  dplyr::select(-Title, -dim) |>
  dplyr::rename(dataset = Item)

head(dsets_tagged)

To invert the table, need to split tags into separate observations, then collapse the rows for the same tag.

dset_split <- dsets_tagged |>
  tidyr::separate_longer_delim(tags, delim = ";") |>
  dplyr::mutate(tag = stringr::str_trim(tags)) |>
  dplyr::select(-tags)

#' ## collapse the rows for the same tag
tag_dset <- dset_split |>
  arrange(tag) |>
  dplyr::group_by(tag) |>
  dplyr::summarise(datasets = paste(dataset, collapse = "; ")) |> ungroup()

# get a list of the unique tags
unique(tag_dset$tag)

Make this into a nice table

Another sheet in the spreadsheet gives a more descriptive topic for corresponding to each tag.

tags <- read_excel(here::here("inst", "extdata", "vcdExtra-datasets.xlsx"), 
                   sheet="tags")
head(tags)

Now, join this with the tag_dset created above.

tag_dset <- tag_dset |>
  dplyr::left_join(tags, by = "tag") |>
  dplyr::relocate(topic, .after = tag)

tag_dset |>
  dplyr::select(-tag) |>
  head()

Add links to help()

We're almost there. It would be nice if the dataset names could be linked to their documentation. This function is designed to work with the pkgdown site. There are different ways this can be done, but what seems to work is a link to ../reference/{dataset}.html Unfortunately, this won't work in the actual vignette.

add_links <- function(dsets, 
                      style = c("reference", "help", "rdrr.io"),
                      sep = "; ") {

  style <- match.arg(style)
  names <- stringr::str_split_1(dsets, sep)

  names <- dplyr::case_when(
    style == "help"      ~ glue::glue("[{names}](help({names}))"),
    style == "reference" ~ glue::glue("[{names}](../reference/{names}.html)"),
    style == "rdrr.io"   ~ glue::glue("[{names}](https://rdrr.io/cran/vcdExtra/man/{names}.html)")
  )  
  glue::glue_collapse(names, sep = sep)
}

Make the table {#table}

Use purrr::map() to apply add_links() to all the datasets for each tag. (mutate(datasets = add_links(datasets)) by itself doesn't work.)

tag_dset |>
  dplyr::select(-tag) |>
  dplyr::mutate(datasets = purrr::map(datasets, add_links)) |>
  knitr::kable()

Voila!



Try the vcdExtra package in your browser

Any scripts or data that you put into this service are public.

vcdExtra documentation built on Aug. 22, 2023, 9:11 a.m.