knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This post has referred to a vignette from dplyr, you can find it in https://dplyr.tidyverse.org/articles/two-table.html. We'll try to display how to join data tables in this vignette. First, load the packages we need and get some data.
library(tidyfst) library(nycflights13) flights2 <- flights %>% select_dt(year,month,day, hour, origin, dest, tailnum, carrier)
Do a left join with a simple:
flights2 %>% left_join_dt(airlines)
Join works the same as dplyr:
flights2 %>% left_join_dt(weather) flights2 %>% left_join_dt(planes, by = "tailnum") flights2 %>% left_join_dt(airports, c("dest" = "faa")) flights2 %>% left_join_dt(airports, c("origin" = "faa"))
df1 <- data.table(x = c(1, 2), y = 2:1) df2 <- data.table(x = c(1, 3), a = 10, b = "a") df1 %>% inner_join_dt(df2) df1 %>% left_join_dt(df2) df1 %>% right_join_dt(df2) df1 %>% full_join_dt(df2)
If all you have is a data.frame or tibble, you have no need to change the format. Feed the data directly:
df1 <- data.frame(x = c(1, 1, 2), y = 1:3) df2 <- data.frame(x = c(1, 1, 2), z = c("a", "b", "a")) df1 %>% left_join_dt(df2)
The "_dt" suffix should remind you that this is backed up by data.table and will always return a data.table in the end.
Filtering joins have also been supported in tidyfst.
flights %>% anti_join_dt(planes, by = "tailnum") %>% count_dt(tailnum, sort = TRUE)
Other examples (semi_join_dt() and anti_join_dt() never duplicate; they only ever remove observations.):
df1 <- data.frame(x = c(1, 1, 3, 4), y = 1:4) df2 <- data.frame(x = c(1, 1, 2), z = c("a", "b", "a")) # Four rows to start with: df1 %>% nrow() # And we get four rows after the join df1 %>% inner_join_dt(df2, by = "x") %>% nrow() # But only two rows actually match df1 %>% semi_join_dt(df2, by = "x") %>% nrow()
For set operations, wrap data.table's function directly, but the functions will automatically turn any data.frame into data.table. Examples are listed as below:
x = iris[c(2,3,3,4),] x2 = iris[2:4,] y = iris[c(3:5),] intersect_dt(x, y) # intersect intersect_dt(x, y, all=TRUE) # intersect all setdiff_dt(x, y) # except setdiff_dt(x, y, all=TRUE) # except all union_dt(x, y) # union union_dt(x, y, all=TRUE) # union all setequal_dt(x, x2, all=FALSE) # setequal setequal_dt(x, x2)
For more details, just find the help from data.table using ?setops.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.