knitr::opts_chunk$set( collapse = TRUE, echo = FALSE, comment = "#>" )
library(wr.data.table)
While data.table offers blazing fast functions, the syntax may be hard to remember if not used frequently particularly to those who does not use data.table frequently.
This package aims to provide wrapper functions to carry out common tasks using functions in the data.table library without having to remember the data.table syntax.
This vignette outlines detail usages for these wrapper function
data.table(dt) removes row names.
This function converts a data.frame into a data.table and use the row names as the first column.
To use row names as the first column, user must use data.table(df, keep.rownames = True)
where df is a data.frame object
dt_keeprownames(df) will keep the row names as the first column of the data.table
dt <- dt_keeprownames(data1) dt
User can rename columns of the data.table using rn_cols(). Note that rn_cols() function renames the original data.table therefore assigning the output back to dt is not necessary. The function does provide the data.table as output so that user can use this function in piping.
dt <- dt |> rn_cols(rn = `vehicle model`) dt
User can select colummns using sel_cols()
subset <- dt |> sel_cols(`vehicle model`, mpg:hp, gear, vs) subset
Instead of selecting columns, user can also choose to deselect certain columns.
(dt |> desel_cols(gear, vs, am, drat, carb, qsec))
User can use expressions and grepl() in filter_rows() function to filter and subset the data in rows.
dt |> filter_rows(mpg >= 20) dt |> filter_rows(grepl("^M", `vehicle model`))
set_group() function adds a "group" attribute to the input data.table.
When the condense() function is called, the group attribute is used as the "by" arugment in the data.table to summarize the data by group
The example below calculates the mean and standard deviation of the data by groups of "gear" and "vs". Note that .I notation used in data.table also works here
subset |> set_group(`vehicle model`, gear, vs) |> attributes() subset |> set_group(`vehicle model`, gear, vs) |> condense(mu_mpg=round(mean(mpg),2), sd_mpg=round(sd(mpg),2), count=.N) |> rm_group()
Characters can be used to refer to column names, but it needs to be parse into unevaluated symbols first
myvar <- "mpg" condense(dt |> set_group(vs), mu_mpg = mean(eval(char_to_symbol(myvar))))
User can def new columns using the def_cols() function
For example, user can create an index column followed by using the arrange_cols() function to place the index column at the front of the data.table
subset1 <- subset |> def_cols(index ~ 1:nrow(subset)) |> arrange_cols(at="start", index) subset1
User can create new columns from existing columns, for example,
subset2 <- subset1 |> def_cols(kpL ~ mpg * 0.425144, vs ~ ifelse(vs==0, "V-shaped", "Straight")) |> arrange_cols(at=mpg, kpL) subset2
spread(subset2, key = vs, value = kpL)
User can arrange data in ascending or descending order according to the values in a chosen row.
Multiple variables can be used to sort data in rows. The priority of sorting follows the order the variables are entered in the arugments.
The ascend_rows or descend_rows functions works with set_groups function. The grouping variables would be seperated and placed to the left-hand most columns of the data.table.
dt |> ascend_rows(vs, mpg) dt |> ascend_rows(c("vs", "mpg")) dt |> descend_rows(vs, mpg) dt |> descend_rows(c("vs", "mpg")) dt |> set_group(cyl) |> ascend_rows(vs, mpg) dt |> set_group(cyl) |> ascend_rows(c("vs", "mpg")) dt |> set_group(cyl) |> descend_rows(vs, mpg) dt |> set_group(cyl) |> descend_rows(c("vs", "mpg"))
(dt |> arrange_cols(at="start", mpg, drat, hp)) (dt |> arrange_cols(at="cyl", mpg, drat, hp)) (dt |> arrange_cols(at=cyl, mpg, drat, hp)) (dt |> arrange_cols(at="end", c("mpg", "drat", "hp")))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.