knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
hacksaw is as an adhesive between various dplyr and purrr operations, with some extra tidyverse-like functionality (e.g. keeping NAs, shifting row values) and shortcuts (e.g. filtering patterns, casting, plucking, etc.).
You can install the released version of hacksaw from CRAN with:
install.packages("hacksaw")
Or install the development version from GitHub with:
remotes::install_github("daranzolin/hacksaw")
hacksaw's assortment of split operations recycle the original data frame. This is useful when you want to run slightly different code on the same object multiple times (e.g. assignment) or you want to take advantage of some list functionality (e.g. purrr, lengths()
, %->%
, etc.).
The useful%<-%
and %->%
operators are re-exported from the zeallot package.
library(hacksaw) library(tidyverse) iris %>% filter_split( large_petals = Petal.Length > 5.1, large_sepals = Sepal.Length > 6.4 ) %>% map(summary)
Include multiple columns and select helpers within c()
:
iris %>% select_split( sepal_data = c(Species, starts_with("Sepal")), petal_data = c(Species, starts_with("Petal")) ) %>% str()
Count across multiple variables:
mtcars %>% count_split( cyl, carb, gear )
Rolling counts, left-to-right
mtcars %>% rolling_count_split( cyl, carb, gear )
Easily get the unique values of multiple columns:
starwars %>% distinct_split(skin_color, eye_color, homeworld) %>% str() # lengths() is also useful
iris %>% mutate_split( Sepal.Length2 = Sepal.Length * 2, Sepal.Length3 = Sepal.Length * 3 ) %>% str()
Separate groups:
mtcars %>% group_by_split(cyl, gear, am, across(c(cyl, gear))) %>% map(tally, wt = vs)
Rolling groups, left-to-right:
mtcars %>% rolling_group_by_split( cyl, carb, gear ) %>% map(summarize, mean_mpg = mean(mpg))
mtcars %>% nest_by_split(cyl, gear) %>% map(mutate, model = list(lm(mpg ~ wt, data = data)))
mtcars %>% rolling_nest_by_split(cyl, gear) %>% map(mutate, model = list(lm(mpg ~ wt, data = data)))
iris %>% transmute_split(Sepal.Length * 2, Petal.Width + 5) %>% str()
iris %>% slice_split(1:10, 11:15, 30:50) %>% str()
Use the var_max
and var_min
helpers to easily get minimum and maximum values of a variable:
iris %>% slice_split( largest_sepals = var_max(Sepal.Length, 4), smallest_sepals = var_min(Sepal.Length, 4) )#
precision_split
splits the mtcars data frame into two: one with mpg greater than 20, one with mpg less than 20:
mtcars %>% precision_split(mpg > 20) %->% c(lt20mpg, gt20mpg) str(gt20mpg) str(lt20mpg)
Evaluate any expression:
mtcars %>% eval_split( select(hp, mpg), filter(mpg > 25), mutate(pounds = wt*1000) ) %>% str()
Tired of mutate(var = as.[character|numeric|logical](var))
?
starwars %>% cast_character(height, mass) %>% str(max.level = 2) iris %>% cast_character(contains(".")) %>% str(max.level = 1)
hacksaw also includes cast_numeric
and cast_logical
.
The reverse of tidyr::drop_na
, strangely omitted in the original tidyverse.
df <- tibble(x = c(1, 2, NA, NA, NA), y = c("a", NA, "b", NA, NA)) df %>% keep_na() df %>% keep_na(x) df %>% keep_na(x, y)
I never care if my join keys are incompatible. The *_join2
suite of functions coerce either the left or right table accordingly.
df1 <- tibble(x = 1:10, b = 1:10, y = letters[1:10]) df2 <- tibble(x = as.character(1:10), z = letters[11:20]) left_join2(df1, df2)
Shift values across rows in either direction. Sometimes useful when importing irregularly-shaped tabular data.
df <- tibble( s = c(NA, 1, NA, NA), t = c(NA, NA, 1, NA), u = c(NA, NA, 2, 5), v = c(5, 1, 9, 2), x = c(1, 5, 6, 7), y = c(NA, NA, 8, NA), z = 1:4 ) df shift_row_values(df) shift_row_values(df, at = 1:3) shift_row_values(df, at = 1:2, .dir = "right")
A wrapper around filter(grepl(..., var))
:
starwars %>% filter_pattern(homeworld, "oo") %>% distinct(homeworld)
Use keep_pattern
and discard_pattern
for lists and vectors.
A wrapper around x[p][i]
:
df <- tibble( id = c(1, 1, 1, 2, 2, 2, 3, 3), tested = c("no", "no", "yes", "no", "no", "no", "yes", "yes"), year = c(2015:2017, 2010:2012, 2019:2020) ) df %>% group_by(id) %>% mutate(year_first_tested = pluck_when(year, tested == "yes"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.