Introduction to keyholder

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
options(tibble.print_min = 3, tibble.print_max = 3)

keyholder is a package for storing information (keys) about rows of data frame like objects. The common use cases are to track rows of data without modifying it and to backup and restore information about rows. This is done with creating a class keyed_df which has special attribute "keys". Keys are updated according to changes in rows of reference data frame.

keyholder is designed to work tightly with dplyr package. All its one- and two-table verbs update keys properly.

library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(keyholder, quietly = TRUE, warn.conflicts = FALSE)
mtcars_tbl <- mtcars %>% as_tibble()

Set keys

The general agreement is that keys are always converted to tibble. In this way one can use multiple variables as keys by binding them.

There are two general ways of creating keys:

mtcars_tbl_keyed <- mtcars_tbl
keys(mtcars_tbl_keyed) <- tibble(id = 1:nrow(mtcars_tbl_keyed))

mtcars_tbl %>% assign_keys(tibble(id = 1:nrow(.)))
mtcars_tbl %>% key_by(vs, am)

mtcars_tbl %>% key_by(starts_with("c"))

mtcars_tbl %>% key_by(starts_with("c"), .exclude = TRUE)

  # Scoped variants
mtcars_tbl %>% key_by_all()

# One can also rename variables before keying by supplying .funs
mtcars_tbl %>% key_by_if(rlang::is_integerish, .funs = toupper)

mtcars_tbl %>% key_by_at(c("vs", "am"))

To track rows use use_id() which creates a special key .id with row numbers as values.

To properly unkey object use unkey().

mtcars_tbl_keyed <- mtcars_tbl %>% key_by(vs, am)

# Good
mtcars_tbl_keyed %>% unkey()

# Bad
attr(mtcars_tbl_keyed, "keys") <- NULL
mtcars_tbl_keyed

Get keys

There are three ways of extracting keys:

mtcars_tbl %>% keys()

mtcars_tbl %>% key_by(vs, am) %>% keys()
mtcars_tbl %>% raw_keys()

mtcars_tbl %>% key_by(vs, am) %>% raw_keys()
mtcars_tbl %>% key_by(vs, am) %>% pull_key(vs)

Manipulate keys

mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys(vs)

mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys(everything(), .unkey = TRUE)

  # Scoped variants
# Identical to previous one
mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys_all(.unkey = TRUE)

mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys_if(rlang::is_integerish)
mtcars_tbl_keyed <- mtcars_tbl %>%
  key_by(vs, mpg) %>%
  mutate(vs = 1, mpg = 0)
mtcars_tbl_keyed

mtcars_tbl_keyed %>% restore_keys(vs)

mtcars_tbl_keyed %>% restore_keys(vs, .remove = TRUE)

mtcars_tbl_keyed %>% restore_keys(vs, mpg, .unkey = TRUE)

mtcars_tbl_keyed %>% restore_keys(vs, mpg, .remove = TRUE, .unkey = TRUE)

  # Scoped variants
mtcars_tbl_keyed %>% restore_keys_all()

mtcars_tbl_keyed %>% restore_keys_if(rlang::is_integerish, .remove = TRUE)

One important feature of restore_keys() is that restoring keys beats 'not-modifying' grouping variables rule. It is made according to the ideology of keys: they contain information about rows and by restoring you want it to be available. Groups are recomputed after restoring.

mtcars_tbl_keyed %>% group_by(vs, mpg)

mtcars_tbl_keyed %>% group_by(vs, mpg) %>% restore_keys(vs, mpg)
mtcars_tbl %>% key_by(vs, am) %>% rename_keys(Vs = vs)

  # Scoped variants
mtcars_tbl %>% key_by(vs, am) %>% rename_keys_all(.funs = toupper)

React to subset

A method for subsetting function [ is implemented for keyed_df to react on changes in rows: if rows in reference data frame are rearranged or removed the same operation is done to keys.

mtcars_tbl_subset <- mtcars_tbl %>% key_by(vs, am) %>%
  `[`(c(3, 18, 19), c(2, 8, 9))

mtcars_tbl_subset

keys(mtcars_tbl_subset)

Verbs from dplyr

All one- and two-table verbs from dplyr (with present scoped variants) support keyed_df. Most functions react to changes in rows as in [ but some functions (summarise(), distinct() and do()) unkey object.

mtcars_tbl_keyed <- mtcars_tbl %>% key_by(vs, am)

mtcars_tbl_keyed %>% select(gear, mpg)

mtcars_tbl_keyed %>% summarise(meanMPG = mean(mpg))

mtcars_tbl_keyed %>% filter(vs == 1) %>% keys()

mtcars_tbl_keyed %>% arrange_at("mpg") %>% keys()

band_members %>% key_by(name) %>%
  semi_join(band_instruments, by = "name") %>%
  keys()


Try the keyholder package in your browser

Any scripts or data that you put into this service are public.

keyholder documentation built on March 31, 2023, 5:37 p.m.