In daranzolin/hacksaw: Additional Tools for Splitting and Cleaning Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

hacksaw

CRAN log

hacksaw is as an adhesive between various dplyr and purrr operations, with some extra tidyverse-like functionality (e.g. keeping NAs, shifting row values) and shortcuts (e.g. filtering patterns, casting, plucking, etc.).

Installation

You can install the released version of hacksaw from CRAN with:

install.packages("hacksaw")

Or install the development version from GitHub with:

remotes::install_github("daranzolin/hacksaw")

Split operations

hacksaw's assortment of split operations recycle the original data frame. This is useful when you want to run slightly different code on the same object multiple times (e.g. assignment) or you want to take advantage of some list functionality (e.g. purrr, lengths(), %->%, etc.).

The useful%<-% and %->% operators are re-exported from the zeallot package.

filter

library(hacksaw)
library(tidyverse)

iris %>% 
  filter_split(
    large_petals = Petal.Length > 5.1,
    large_sepals = Sepal.Length > 6.4
  ) %>% 
  map(summary)

select

Include multiple columns and select helpers within c():

iris %>% 
  select_split(
    sepal_data = c(Species, starts_with("Sepal")),
    petal_data = c(Species, starts_with("Petal"))
  ) %>% 
  str()

count

Count across multiple variables:

mtcars %>% 
  count_split(
    cyl,
    carb,
    gear
    )

rolling_count_split

Rolling counts, left-to-right

mtcars %>% 
  rolling_count_split(
    cyl,
    carb,
    gear
    )

distinct

Easily get the unique values of multiple columns:

starwars %>% 
  distinct_split(skin_color, eye_color, homeworld) %>% 
  str() # lengths() is also useful

mutate

iris %>% 
  mutate_split(
    Sepal.Length2 = Sepal.Length * 2,
    Sepal.Length3 = Sepal.Length * 3
  ) %>% 
  str()

group_by

Separate groups:

mtcars %>% 
  group_by_split(cyl, gear, am, across(c(cyl, gear))) %>% 
  map(tally, wt = vs)

rolling_group_by_split

Rolling groups, left-to-right:

mtcars %>% 
  rolling_group_by_split(
    cyl, 
    carb, 
    gear
  ) %>% 
  map(summarize, mean_mpg = mean(mpg))

nest_by

mtcars %>%
    nest_by_split(cyl, gear) %>%
    map(mutate, model = list(lm(mpg ~ wt, data = data)))

rolling_nest_by

mtcars %>%
    rolling_nest_by_split(cyl, gear) %>%
    map(mutate, model = list(lm(mpg ~ wt, data = data)))

transmute

iris %>% 
  transmute_split(Sepal.Length * 2, Petal.Width + 5) %>% 
  str()

slice

iris %>% 
  slice_split(1:10, 11:15, 30:50) %>% 
  str()

Use the var_max and var_min helpers to easily get minimum and maximum values of a variable:

iris %>% 
  slice_split(
    largest_sepals = var_max(Sepal.Length, 4),
    smallest_sepals = var_min(Sepal.Length, 4)
  )#

precision_split

precision_split splits the mtcars data frame into two: one with mpg greater than 20, one with mpg less than 20:

mtcars %>% 
  precision_split(mpg > 20) %->% c(lt20mpg, gt20mpg)

str(gt20mpg)
str(lt20mpg)

eval_split

Evaluate any expression:

mtcars %>% 
  eval_split(
    select(hp, mpg),
    filter(mpg > 25),
    mutate(pounds = wt*1000)
  ) %>% 
  str()

Casting

Tired of mutate(var = as.[character|numeric|logical](var))?

starwars %>% cast_character(height, mass) %>% str(max.level = 2) 
iris %>% cast_character(contains(".")) %>% str(max.level = 1)

hacksaw also includes cast_numeric and cast_logical.

Keeping NAs

The reverse of tidyr::drop_na, strangely omitted in the original tidyverse.

df <- tibble(x = c(1, 2, NA, NA, NA), y = c("a", NA, "b", NA, NA))
df %>% keep_na()
df %>% keep_na(x)
df %>% keep_na(x, y)

Coercive joins

I never care if my join keys are incompatible. The *_join2 suite of functions coerce either the left or right table accordingly.

df1 <- tibble(x = 1:10, b = 1:10, y = letters[1:10])
df2 <- tibble(x = as.character(1:10), z = letters[11:20])
left_join2(df1, df2)

Shifting row values

Shift values across rows in either direction. Sometimes useful when importing irregularly-shaped tabular data.

df <- tibble(
  s = c(NA, 1, NA, NA),
  t = c(NA, NA, 1, NA),
  u = c(NA, NA, 2, 5),
  v = c(5, 1, 9, 2),
  x = c(1, 5, 6, 7),
  y = c(NA, NA, 8, NA),
  z = 1:4
)
df

shift_row_values(df)
shift_row_values(df, at = 1:3)
shift_row_values(df, at = 1:2, .dir = "right")

Filtering, keeping, and discarding patterns

A wrapper around filter(grepl(..., var)):

starwars %>% 
  filter_pattern(homeworld, "oo") %>% 
  distinct(homeworld)

Use keep_pattern and discard_pattern for lists and vectors.

Plucking values

A wrapper around x[p][i]:

df <- tibble(
  id = c(1, 1, 1, 2, 2, 2, 3, 3),
  tested = c("no", "no", "yes", "no", "no", "no", "yes", "yes"),
  year = c(2015:2017, 2010:2012, 2019:2020)
) 

df %>% 
  group_by(id) %>%
  mutate(year_first_tested = pluck_when(year, tested == "yes"))

daranzolin/hacksaw documentation built on April 14, 2021, 6:32 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

daranzolin/hacksaw
Additional Tools for Splitting and Cleaning Data

In daranzolin/hacksaw: Additional Tools for Splitting and Cleaning Data

hacksaw

Installation

Split operations

filter

select

count

rolling_count_split

distinct

mutate

group_by

rolling_group_by_split

nest_by

rolling_nest_by

transmute

slice

precision_split

eval_split

Casting

Keeping NAs

Coercive joins

Shifting row values

Filtering, keeping, and discarding patterns

Plucking values

R Package Documentation

Browse R Packages

We want your feedback!

daranzolin/hacksaw Additional Tools for Splitting and Cleaning Data

In daranzolin/hacksaw: Additional Tools for Splitting and Cleaning Data

hacksaw

Installation

Split operations

filter

select

count

rolling_count_split

distinct

mutate

group_by

rolling_group_by_split

nest_by

rolling_nest_by

transmute

slice

precision_split

eval_split

Casting

Keeping NAs

Coercive joins

Shifting row values

Filtering, keeping, and discarding patterns

Plucking values

R Package Documentation

Browse R Packages

We want your feedback!

daranzolin/hacksaw
Additional Tools for Splitting and Cleaning Data