Tools for Data Manipulation

cat(gsub("\\n   ", "", packageDescription("dat", fields = "Description")))

Installation

From GitHub

remotes::install_github("wahani/dat")

From CRAN

install.packages("dat")

Why should you care?

Tools for data manipulation

The examples are from the introductory vignette of dplyr. You still work with data frames: so you can simply mix in dplyr features whenever you need them.

library("nycflights13")
library("dat")

Select rows

We can use mutar to select rows. When you reference a variable in the data frame, you can indicate this by using a one sided formula.

mutar(flights, ~ month == 1 & day == 1)
mutar(flights, ~ 1:10)

And for sorting:

mutar(flights, ~ order(year, month, day))

Select cols

You can use characters, logicals, regular expressions and functions to select columns. Regular expressions are indicated by a leading "^".

flights %>%
  extract(c("year", "month", "day")) %>%
  extract("^day$") %>%
  extract(is.numeric)

Operations on columns

The main difference between dplyr::mutate and mutar is that you use a ~ instead of =.

mutar(
  flights,
  gain ~ arr_delay - dep_delay,
  speed ~ distance / air_time * 60
)

Grouping data is handled within mutar:

mutar(flights, n ~ .N, by = "month")
mutar(flights, delay ~ mean(dep_delay, na.rm = TRUE), by = "month")

You can also provide additional arguments to a formula. This is especially helpful when you want to pass arguments from a function to such expressions. The additional augmentation can be anything which you can use to select columns (character, regular expression, function) or a named list where each element is a character.

mutar(
  flights,
  .n ~ mean(.n, na.rm = TRUE) | "^.*delay$",
  .x ~ mean(.x, na.rm = TRUE) | list(.x = "arr_time"),
  by = "month"
)

A link to S4

Using this package you can create S4 classes to contain a data frame (or a data.table) and use the interface to dplyr. Both dplyr and data.table do not support integration with S4. The main function here is mutar which is generic enough to link to subsetting of rows and cols as well as mutate and summarise. In the background dplyrs ability to work on a data.table is being used.

library("data.table")

setClass("DataTable", "data.table")

DataTable <- function(...) {
  new("DataTable", data.table::data.table(...))
}

setMethod("[", "DataTable", mutar)

dtflights <- do.call(DataTable, nycflights13::flights)

dtflights[1:10, c("year", "month", "day")]
dtflights[n ~ .N, by = "month"]
dtflights[n ~ .N, sby = "month"]

dtflights %>%
  filtar(~month > 6) %>%
  mutar(n ~ .N, by = "month") %>%
  sumar(n ~ data.table::first(n), by = "month")

Working with vectors

Inspired by rlist and purrr some low level operations on vectors are supported. The aim here is to integrate syntactic sugar for anonymous functions. Furthermore the functions should support the use of pipes.

What we can do with map:

map(1:3, ~ .^2)
flatmap(1:3, ~ .^2)
map(1:3 ~ 11:13, c) # zip
dat <- data.frame(x = 1, y = "")
map(dat, x ~ x + 1, is.numeric)

What we can do with extract:

extract(1:10, ~ . %% 2 == 0) %>% sum
extract(1:15, ~ 15 %% . == 0)
l <- list(aList = list(x = 1), aAtomic = "hi")
extract(l, "^aL")
extract(l, is.atomic)

What we can do with replace:

replace(c(1, 2, NA), is.na, 0)
replace(c(1, 2, NA), rep(TRUE, 3), 0)
replace(c(1, 2, NA), 3, 0)
replace(list(x = 1, y = 2), "x", 0)
replace(list(x = 1, y = 2), "^x$", 0)
replace(list(x = 1, y = "a"), is.character, NULL)


Try the dat package in your browser

Any scripts or data that you put into this service are public.

dat documentation built on July 1, 2020, 7:11 p.m.