knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" ) options(tibble.print_min = 3, tibble.print_max = 3)
keyholder
is a package for storing information (keys) about rows of data
frame like objects. The common use cases are to track rows of data without
modifying it and to backup and restore information about rows. This is done with
creating a class keyed_df which has special attribute "keys". Keys are
updated according to changes in rows of reference data frame.
keyholder
is designed to work tightly with
dplyr package. All its one- and two-table verbs update keys properly.
library(dplyr, quietly = TRUE, warn.conflicts = FALSE) library(keyholder, quietly = TRUE, warn.conflicts = FALSE)
mtcars_tbl <- mtcars %>% as_tibble()
The general agreement is that keys are always converted to tibble. In this way one can use multiple variables as keys by binding them.
There are two general ways of creating keys:
as_tibble()
. To make sense it should have the same number of rows as reference
data frame. There are two functions for assigning: keys<-
and assign_keys()
which are basically the same. The former use more suitable for interactive use
and the latter - for piping with
magrittr's pipe operator %>%
.mtcars_tbl_keyed <- mtcars_tbl keys(mtcars_tbl_keyed) <- tibble(id = 1:nrow(mtcars_tbl_keyed)) mtcars_tbl %>% assign_keys(tibble(id = 1:nrow(.)))
key_by()
and its scoped variants (*_all()
, *_if()
and *_at()
).
This is similar in its design to dplyr
's group_by()
: it takes some columns
from reference data frame and makes keys from them. It has two important
options: .add
(whether to add specified columns to existing keys) and
.exclude
(whether to exclude specified columns from reference data frame).
Grouping is ignored.mtcars_tbl %>% key_by(vs, am) mtcars_tbl %>% key_by(starts_with("c")) mtcars_tbl %>% key_by(starts_with("c"), .exclude = TRUE) # Scoped variants mtcars_tbl %>% key_by_all() # One can also rename variables before keying by supplying .funs mtcars_tbl %>% key_by_if(rlang::is_integerish, .funs = toupper) mtcars_tbl %>% key_by_at(c("vs", "am"))
To track rows use use_id()
which creates a special key .id
with row numbers
as values.
To properly unkey object use unkey()
.
mtcars_tbl_keyed <- mtcars_tbl %>% key_by(vs, am) # Good mtcars_tbl_keyed %>% unkey() # Bad attr(mtcars_tbl_keyed, "keys") <- NULL mtcars_tbl_keyed
There are three ways of extracting keys:
keys()
. This function always returns a tibble. In case of no keys it
returns a tibble with number of rows as in reference data frame and zero
columns.mtcars_tbl %>% keys() mtcars_tbl %>% key_by(vs, am) %>% keys()
raw_keys()
which is just a wrapper for attr(.tbl, "keys")
.mtcars_tbl %>% raw_keys() mtcars_tbl %>% key_by(vs, am) %>% raw_keys()
pull_key()
which works like dplyr
's pull
applied to keys:mtcars_tbl %>% key_by(vs, am) %>% pull_key(vs)
remove_keys()
and its scoped variants. If all keys
are removed one can automatically unkey object by setting option .unkey
to
TRUE
.mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys(vs) mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys(everything(), .unkey = TRUE) # Scoped variants # Identical to previous one mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys_all(.unkey = TRUE) mtcars_tbl %>% key_by(vs, mpg) %>% remove_keys_if(rlang::is_integerish)
restore_keys()
and its scoped variants. Restoring
means creating or modifying a column in reference data frame with values taken
from keys. After restoring certain key one can remove it from keys by setting
.remove
to TRUE
. There is also an option .unkey
identical to one in
remove_keys()
(which is meaningful only in case .remove
is TRUE
).mtcars_tbl_keyed <- mtcars_tbl %>% key_by(vs, mpg) %>% mutate(vs = 1, mpg = 0) mtcars_tbl_keyed mtcars_tbl_keyed %>% restore_keys(vs) mtcars_tbl_keyed %>% restore_keys(vs, .remove = TRUE) mtcars_tbl_keyed %>% restore_keys(vs, mpg, .unkey = TRUE) mtcars_tbl_keyed %>% restore_keys(vs, mpg, .remove = TRUE, .unkey = TRUE) # Scoped variants mtcars_tbl_keyed %>% restore_keys_all() mtcars_tbl_keyed %>% restore_keys_if(rlang::is_integerish, .remove = TRUE)
One important feature of restore_keys()
is that restoring keys beats
'not-modifying' grouping variables rule. It is made according to the ideology of
keys: they contain information about rows and by restoring you want it to be
available. Groups are recomputed after restoring.
mtcars_tbl_keyed %>% group_by(vs, mpg) mtcars_tbl_keyed %>% group_by(vs, mpg) %>% restore_keys(vs, mpg)
rename_keys()
and its scoped variants. Renaming is done
with dplyr
's rename()
or its scoped variant and so renaming format comes
from them.mtcars_tbl %>% key_by(vs, am) %>% rename_keys(Vs = vs) # Scoped variants mtcars_tbl %>% key_by(vs, am) %>% rename_keys_all(.funs = toupper)
A method for subsetting function [
is implemented for keyed_df
to react on
changes in rows: if rows in reference data frame are rearranged or removed the
same operation is done to keys.
mtcars_tbl_subset <- mtcars_tbl %>% key_by(vs, am) %>% `[`(c(3, 18, 19), c(2, 8, 9)) mtcars_tbl_subset keys(mtcars_tbl_subset)
All one- and two-table verbs from dplyr
(with present scoped variants) support
keyed_df
. Most functions react to changes in rows as in [
but some functions
(summarise()
, distinct()
and do()
) unkey object.
mtcars_tbl_keyed <- mtcars_tbl %>% key_by(vs, am) mtcars_tbl_keyed %>% select(gear, mpg) mtcars_tbl_keyed %>% summarise(meanMPG = mean(mpg)) mtcars_tbl_keyed %>% filter(vs == 1) %>% keys() mtcars_tbl_keyed %>% arrange_at("mpg") %>% keys() band_members %>% key_by(name) %>% semi_join(band_instruments, by = "name") %>% keys()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.