table.express-package: Building 'data.table' expressions with data manipulation...

table.express-packageR Documentation

Building 'data.table' expressions with data manipulation verbs

Description

A specialization of dplyr verbs, as well as a set of custom ones, that build expressions that can be used within a data.table's frame.

Note

Note that since version 0.3.0, it is not possible to load table.express and dtplyr at the same time, since they define the same data.table methods for many dplyr generics.

Bearing in mind that data.tables are also data.frames, we have to consider that other packages may uses dplyr internally without importing data.table. Since dplyr's methods are generic, calls to these methods in such packages would fail. The functions in this package try to detect when this happens and delegate to the data.frame methods with a warning, which can be safely ignored if you know that the error originates from a package that is not meant to work with data.table. To avoid the warning, use options(table.express.warn.cedta = FALSE).

This software package was developed independently of any organization or institution that is or has been associated with the author.

Author(s)

Alexis Sarda-Espinosa

See Also

Useful links:

Examples

require("data.table")

data("mtcars")

DT <- as.data.table(mtcars)

# ====================================================================================
# Simple dplyr-like transformations

DT %>%
    group_by(cyl) %>%
    filter(vs == 0, am == 1) %>%
    transmute(mean_mpg = mean(mpg)) %>%
    arrange(-cyl)

# Equivalent to previous
DT %>%
    start_expr %>%
    transmute(mean_mpg = mean(mpg)) %>%
    where(vs == 0, am == 1) %>%
    group_by(cyl) %>%
    order_by(-cyl) %>%
    end_expr

# Modification by reference
DT %>%
    where(gear %% 2 != 0, carb %% 2 == 0) %>%
    mutate(wt_squared = wt ^ 2)

print(DT)

# Deletion by reference
DT %>%
    mutate(wt_squared = NULL) %>%
    print

# Support for tidyslect helpers

DT %>%
    select(ends_with("t"))

# ====================================================================================
# Helpers to transform a subset of data

# Like DT[, (whole) := lapply(.SD, as.integer), .SDcols = whole]
whole <- names(DT)[sapply(DT, function(x) { all(x %% 1 == 0) })]
DT %>%
    mutate_sd(as.integer, .SDcols = whole)

sapply(DT, class)

# Like DT[, lapply(.SD, fun), .SDcols = ...]
DT %>%
    transmute_sd((.COL - mean(.COL)) / sd(.COL),
                 .SDcols = setdiff(names(DT), whole))

# Filter several with the same condition
DT %>%
    filter_sd(.COL == 1, .SDcols = c("vs", "am"))

# Using secondary indices, i.e. DT[.(4, 5), on = .(cyl, gear)]
DT %>%
    filter_on(cyl = 4, gear = 5) # note we don't use ==

scale_undim <- function(...) {
    as.numeric(scale(...)) # remove dimensions
}

# Chaining
DT %>%
    start_expr %>%
    mutate_sd(as.integer, .SDcols = whole) %>%
    chain %>%
    filter_sd(.COL == 1, .SDcols = c("vs", "am"), .collapse = `|`) %>%
    transmute_sd(scale_undim, .SDcols = !is.integer(.COL)) %>%
    end_expr

# The previous is quivalent to
DT[, (whole) := lapply(.SD, as.integer), .SDcols = whole
   ][vs == 1 | am == 1,
     lapply(.SD, scale_undim),
     .SDcols = names(DT)[sapply(DT, Negate(is.integer))]]

# Alternative to keep all columns (*copying* non-scaled ones)
scale_non_integers <- function(x) {
    if (is.integer(x)) x else scale_undim(x)
}

DT %>%
    filter_sd(.COL == 1, .SDcols = c("vs", "am"), .collapse = `|`) %>%
    transmute_sd(everything(), scale_non_integers)

# Without copying non-scaled
DT %>%
    where(vs == 1 | am == 1) %>%
    mutate_sd(scale, .SDcols = names(DT)[sapply(DT, Negate(is.integer))])

print(DT)

table.express documentation built on April 3, 2023, 5:43 p.m.