Compatibility with dplyr

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

linelist philosophy is to prevent you from accidentally losing valuable data, but to otherwise be totally transparent and not interfere with your workflow.

One popular ecosystem for data science workflow is the tidyverse and we are going the extra mile to ensure linelist compatibility with the tidyverse. All dplyr verbs are thoroughly tested in the tests/test-compat-dplyr.R file.

library(linelist)
library(dplyr)

data("measles_hagelloch_1861", package = "outbreaks")

x <- make_linelist(
  measles_hagelloch_1861,
  id = "case_ID",
  date_onset = "date_of_prodrome",
  age = "age",
  gender = "gender"
)

head(x)

Verbs operating on rows

linelist does not modify anything regarding the behaviour for row-operations. As such, it is fully compatible with dplyr verbs operating on rows out-of-the-box. You can see in the following examples that linelist does not produce any errors, warnings or messages and its tags are conserved through dplyr operations on rows.

dplyr::arrange()

x %>%
  arrange(case_ID) %>%
  head()

dplyr:distinct()

x %>%
  distinct() %>%
  head()

dplyr::filter()

x %>%
  filter(age >= 10) %>%
  head()

dplyr::slice()

x %>%
  slice(5:10)

x %>%
  slice_head(n = 5)

x %>%
  slice_tail(n = 5)

x %>%
  slice_min(age, n = 3)

x %>%
  slice_max(age, n = 3)

x %>%
  slice_sample(n = 5)

Verbs operating on columns

During operations on columns, linelist will:

dplyr::mutate() ✓ (partial)

There is an incomplete compatibility with dplyr::mutate() in that:

Although dplyr::mutate() is not able to leverage to full power of linelist tags, linelist objects behave as expected the same way a data.frame would:

x %>%
  mutate(major = age >= 18) %>%
  head()

dplyr::pull()

dplyr::pull() returns a vector, which results, as expected, in the loss of the linelist class and tags:

x %>%
  pull(age)

dplyr::relocate()

x %>%
  relocate(date_of_prodrome, .before = 1) %>%
  head()

dplyr::rename() & dplyr::rename_with()

dplyr::rename() is fully compatible out-of-the-box with linelist, meaning that tags will be updated at the same time that columns are renamed. This is possibly because it uses names<-() under the hood, which linelist provides a custom names<-.linelist() method for:

x %>%
  rename(edad = age) %>%
  head()

x %>%
  rename_with(toupper) %>%
  head()

dplyr::select() ✓ (partial)

dplyr::select() is currently only partially compatible with linelist because renames with select() do not correctly update tags

# Works fine
x %>%
  select(case_ID, date_of_prodrome, gender, age) %>%
  head()

# Tags are not updated!
x %>%
  select(case_ID, date_of_prodrome, gender, edad = age) %>%
  head()

# Instead, split the selecting and renaming steps:
x %>%
  select(case_ID, date_of_prodrome, gender, age) %>%
  rename(edad = age) %>%
  head()

Verbs operating on groups ✘

Groups are not yet supported. Applying any verb operating on group to a linelist will silently convert it back to a data.frame or tibble.

Verbs operating on data.frames

dplyr::bind_rows()

dim(x)

dim(bind_rows(x, x))

dplyr::bind_cols()

bind_cols() is currently incompatible with linelist:

bind_cols(
  suppressWarnings(select(x, case_ID, date_of_prodrome)),
  suppressWarnings(select(x, age, gender))
) %>%
  head()

Joins ✘

Joins are currently not compatible with linelist as tags from the second element are silently dropped.

full_join(
  suppressWarnings(select(x, case_ID, date_of_prodrome)),
  suppressWarnings(select(x, case_ID, age, gender))
) %>%
  head()

Verbs operating on multiple columns

dplyr::pick()

pick() makes tidyselect functions work in usually tidyselect-incompatible functions, such as:

x %>%
  dplyr::arrange(dplyr::pick(ends_with("loc"))) %>%
  head()

As such, we could expect it to work with linelist custom tidyselect-like function: has_tag() but it's not the case since pick() currently strips out all attributes, including the linelist class and all tags. This unclassing is documented in ?pick:

pick() returns a data frame containing the selected columns for the current group.



Try the linelist package in your browser

Any scripts or data that you put into this service are public.

linelist documentation built on June 22, 2024, 10:54 a.m.