In kleu046/wr.data.table: What the Package Does (One Line, Title Case)

knitr::opts_chunk$set(
  collapse = TRUE,
  echo = FALSE,
  comment = "#>"
)

library(wr.data.table)

Introduction

While data.table offers blazing fast functions, the syntax may be hard to remember if not used frequently particularly to those who does not use data.table frequently.

This package aims to provide wrapper functions to carry out common tasks using functions in the data.table library without having to remember the data.table syntax.

This vignette outlines detail usages for these wrapper function

dt_keeprownames()

data.table(dt) removes row names.

This function converts a data.frame into a data.table and use the row names as the first column.

To use row names as the first column, user must use data.table(df, keep.rownames = True)

where df is a data.frame object

dt_keeprownames(df) will keep the row names as the first column of the data.table

dt <- dt_keeprownames(data1)
dt

Selecting (sel_cols, desel_cols) and renaming columns (rn_cols)

User can rename columns of the data.table using rn_cols(). Note that rn_cols() function renames the original data.table therefore assigning the output back to dt is not necessary. The function does provide the data.table as output so that user can use this function in piping.

dt <- dt |> rn_cols(rn = `vehicle model`)
dt

User can select colummns using sel_cols()

subset <- dt |> sel_cols(`vehicle model`, mpg:hp, gear, vs)
subset

Instead of selecting columns, user can also choose to deselect certain columns.

(dt |> desel_cols(gear, vs, am, drat, carb, qsec))

Filtering data using filter_rows()

User can use expressions and grepl() in filter_rows() function to filter and subset the data in rows.

dt |> filter_rows(mpg >= 20)

dt |> filter_rows(grepl("^M", `vehicle model`))

Summarizing data using set_group() and condense

set_group() function adds a "group" attribute to the input data.table.

When the condense() function is called, the group attribute is used as the "by" arugment in the data.table to summarize the data by group

The example below calculates the mean and standard deviation of the data by groups of "gear" and "vs". Note that .I notation used in data.table also works here

subset |> set_group(`vehicle model`, gear, vs) |> attributes()

subset |> set_group(`vehicle model`, gear, vs) |> condense(mu_mpg=round(mean(mpg),2), sd_mpg=round(sd(mpg),2), count=.N) |> rm_group()

Characters can be used to refer to column names, but it needs to be parse into unevaluated symbols first

myvar <- "mpg"
condense(dt |> set_group(vs), mu_mpg = mean(eval(char_to_symbol(myvar))))

Creating new columns

User can def new columns using the def_cols() function

For example, user can create an index column followed by using the arrange_cols() function to place the index column at the front of the data.table

subset1 <- subset |> def_cols(index ~ 1:nrow(subset)) |> arrange_cols(at="start", index)
subset1

User can create new columns from existing columns, for example,

subset2 <- subset1 |> def_cols(kpL ~ mpg * 0.425144, vs ~ ifelse(vs==0, "V-shaped", "Straight")) |> arrange_cols(at=mpg, kpL)
subset2

spread(subset2, key = vs, value = kpL)

Arranging data in rows

User can arrange data in ascending or descending order according to the values in a chosen row.

Multiple variables can be used to sort data in rows. The priority of sorting follows the order the variables are entered in the arugments.

The ascend_rows or descend_rows functions works with set_groups function. The grouping variables would be seperated and placed to the left-hand most columns of the data.table.

dt |> ascend_rows(vs, mpg)
dt |> ascend_rows(c("vs", "mpg"))
dt |> descend_rows(vs, mpg)
dt |> descend_rows(c("vs", "mpg"))

dt |> set_group(cyl) |> ascend_rows(vs, mpg)
dt |> set_group(cyl) |> ascend_rows(c("vs", "mpg"))
dt |> set_group(cyl) |> descend_rows(vs, mpg)
dt |> set_group(cyl) |> descend_rows(c("vs", "mpg"))

Arranging data columns

(dt |> arrange_cols(at="start", mpg, drat, hp))
(dt |> arrange_cols(at="cyl", mpg, drat, hp))
(dt |> arrange_cols(at=cyl, mpg, drat, hp))
(dt |> arrange_cols(at="end", c("mpg", "drat", "hp")))