knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This post has referred to a vignette from dplyr
, you can find it in https://dplyr.tidyverse.org/articles/two-table.html. We'll try to display how to join data tables in this vignette. First, load the packages we need and get some data.
library(tidyfst) library(nycflights13) flights2 <- flights %>% select_dt(year,month,day, hour, origin, dest, tailnum, carrier)
Do a left join with a simple:
flights2 %>% left_join_dt(airlines)
Join works the same as dplyr
:
flights2 %>% left_join_dt(weather) flights2 %>% left_join_dt(planes, by = "tailnum") flights2 %>% left_join_dt(airports, c("dest" = "faa")) flights2 %>% left_join_dt(airports, c("origin" = "faa"))
df1 <- data.table(x = c(1, 2), y = 2:1) df2 <- data.table(x = c(1, 3), a = 10, b = "a") df1 %>% inner_join_dt(df2) df1 %>% left_join_dt(df2) df1 %>% right_join_dt(df2) df1 %>% full_join_dt(df2)
If all you have is a data.frame or tibble, you have no need to change the format. Feed the data directly:
df1 <- data.frame(x = c(1, 1, 2), y = 1:3) df2 <- data.frame(x = c(1, 1, 2), z = c("a", "b", "a")) df1 %>% left_join_dt(df2)
The "_dt" suffix should remind you that this is backed up by data.table
and will always return a data.table in the end.
Filtering joins have also been supported in tidyfst
.
flights %>% anti_join_dt(planes, by = "tailnum") %>% count_dt(tailnum, sort = TRUE)
Other examples (semi_join_dt()
and anti_join_dt()
never duplicate; they only ever remove observations.):
df1 <- data.frame(x = c(1, 1, 3, 4), y = 1:4) df2 <- data.frame(x = c(1, 1, 2), z = c("a", "b", "a")) # Four rows to start with: df1 %>% nrow() # And we get four rows after the join df1 %>% inner_join_dt(df2, by = "x") %>% nrow() # But only two rows actually match df1 %>% semi_join_dt(df2, by = "x") %>% nrow()
For set operations, wrap data.table
's function directly, but the functions will automatically turn any data.frame into data.table. Examples are listed as below:
x = iris[c(2,3,3,4),] x2 = iris[2:4,] y = iris[c(3:5),] intersect_dt(x, y) # intersect intersect_dt(x, y, all=TRUE) # intersect all setdiff_dt(x, y) # except setdiff_dt(x, y, all=TRUE) # except all union_dt(x, y) # union union_dt(x, y, all=TRUE) # union all setequal_dt(x, x2, all=FALSE) # setequal setequal_dt(x, x2)
For more details, just find the help from data.table
using ?setops
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.