summarizing: Convenient functions for summarizing a 'data.table'.

summarizingR Documentation

Convenient functions for summarizing a data.table.

Description

Convenient functions for summarizing a data.table. These functions are convenient to use with data.table's .SD argument Shorthand functions are also available that only require a data.table, a character vector of variables, vars, and an optional names.extra argument:

Usage

DT_summarize(DT, fun, vars, names.extra = NULL)

DT_mean(DT, vars, names.extra = ".mean", na.rm = FALSE)

DT_sd(DT, vars, names.extra = ".sd", na.rm = FALSE)

DT_var(DT, vars, names.extra = ".var", na.rm = FALSE)

DT_sum(DT, vars, names.extra = ".sum", na.rm = FALSE)

DT_log_diff(DT, vars, names.extra = ".log.diff")

DT_perc_diff(DT, vars, names.extra = ".perc.diff")

Arguments

DT

a data.table. Can also be .SD if used inside another data.table. see examples below

fun

a function used to summarize the data

vars

a character vector with variable names

names.extra

a string used as a suffix for new variable names

na.rm

set to TRUE to remove NAs. Default is FALSE

Details

DT_mean

fun = mean and names.extra defaults to ".mean"

DT_sd

fun = sdand and names.extra defaults to ".sd"

DT_var

fun = var and names.extra defaults to ".var"

DT_sum

fun = sum and names.extra defaults to ".sum"

DT_log_diff

fun = function(x) log(x[length(x)]) -log(x[1]) and names.extra defaults to ".log.diff"

DT_perc_diff

fun = function(x) function(x) (x[length(x)]-x[1])/x[1] and names.extra defaults to ".perc.diff"

Value

a summarized data.table

Examples

data(mtcars)
setDT(mtcars) ##Convert to a data.table

##Base use of DT_summarize()
DT_summarize(mtcars, fun = mean, vars = c("mpg","hp"))
##using with by and .SD
mtcars %>%
    .[, DT_summarize(.SD, fun = mean, vars = c("mpg","hp")), by = cyl]
##Using the convenience function DT_mean (and leaving the names.extra
##argument as the default)
DT_mean(mtcars, vars = c("mpg", "hp"))
mtcars %>%
    .[, DT_mean(.SD, vars = c("mpg","hp")), by = cyl]
##Take the mean of of hp and mpg, and the variance of disp and wt
##Note, we concatenate usign c()
mtcars %>%
    .[, c(DT_mean(.SD, vars = c("mpg", "hp")),
          DT_var(.SD, vars = c("disp", "wt"))
          )]
##by cyl
mtcars %>%
    .[, c(DT_mean(.SD, vars = c("mpg", "hp")),
          DT_var(.SD, vars = c("disp", "wt"))
          ), by = cyl]
##The mean, standard deviation, and variance of mpg and hp
summary.vars <- c("mpg", "hp")
mtcars %>%
    .[, c(DT_mean(.SD, vars = summary.vars),
          DT_sd(.SD, vars = summary.vars),
          DT_var(.SD, vars = summary.vars)), by = cyl]

## Other useful summary functions (may not have a
##useful interpretation here)
summary.vars <- c("mpg", "hp")
mtcars %>%
    .[, c(DT_sum(.SD, vars = summary.vars),
          DT_log_diff(.SD, vars = summary.vars),
          DT_perc_diff(.SD, vars = summary.vars)),
      by = cyl]

ChandlerLutz/CLmisc documentation built on Dec. 2, 2022, 12:40 p.m.