The tidyverse has made it far easier to perform data work in R. But there are still times where performance is important, and for tabular data that’s when data.table can really shine.
For some reason, however, there seems to be an either / or mentality
when it comes to picking between data.table
and dplyr
. I don’t buy
it. I’d rather have the best of both worlds.
That’s why I made these handy adapters. Stick with your tidy data flow, but go ahead and cherry-pick some performance wins when you need them.
NOTE: This is a lie right now. Dare to dream.
You can install the released version of dtplyr from CRAN with:
install.packages("dtplyr")
# install.packages("devtools")
devtools::install_github("mcskinner/dtplyr")
Aggregation with dt_agg()
uses the lean syntax you’re used to from
data.table
:
library(dtplyr)
dt_agg(iris, mean(Sepal.Length), Species)
#> Species V1
#> 1: setosa 5.006
#> 2: versicolor 5.936
#> 3: virginica 6.588
You’ll notice that aggregation skipped the filter. That’s exposed by
dt_op()
if you need it back:
dt_op(iris, Sepal.Width >= 3, mean(Sepal.Length), Species)
#> Species V1
#> 1: setosa 5.029167
#> 2: versicolor 6.218750
#> 3: virginica 6.768966
And why not leverage the high-performance merge routine as well?
colors <- data.frame(Species = c('setosa', 'versicolor', 'virginica'), Color = c('baby blue', 'lavender', 'purple'))
head(dt_merge(iris, colors))
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width Color
#> 1: setosa 5.1 3.5 1.4 0.2 baby blue
#> 2: setosa 4.9 3.0 1.4 0.2 baby blue
#> 3: setosa 4.7 3.2 1.3 0.2 baby blue
#> 4: setosa 4.6 3.1 1.5 0.2 baby blue
#> 5: setosa 5.0 3.6 1.4 0.2 baby blue
#> 6: setosa 5.4 3.9 1.7 0.4 baby blue
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.