join: Join tables

inner_joinR Documentation

Join tables

Description

The mutating joins add columns from 'y' to 'x', matching rows based on the keys:

* 'inner_join()': includes all rows in 'x' and 'y'. * 'left_join()': includes all rows in 'x'. * 'right_join()': includes all rows in 'y'. * 'full_join()': includes all rows in 'x' or 'y'.

Filtering joins filter rows from 'x' based on the presence or absence of matches in 'y':

* 'semi_join()' return all rows from 'x' with a match in 'y'. * 'anti_join()' return all rows from 'x' without a match in 'y'.

Usage

inner_join(x, y, by = NULL, on = NULL)

left_join(x, y, by = NULL, on = NULL)

right_join(x, y, by = NULL, on = NULL)

full_join(x, y, by = NULL, on = NULL)

anti_join(x, y, by = NULL, on = NULL)

semi_join(x, y, by = NULL, on = NULL)

Arguments

x

A data.table

y

A data.table

by

(Optional) A character vector of variables to join by.

If 'NULL', the default, '*_join()' will perform a natural join, using all variables in common across 'x' and 'y'. A message lists the variables so that you can check they're correct; suppress the message by supplying 'by' explicitly.

To join by different variables on 'x' and 'y', use a named vector. For example, 'by = c("a" = "b")' will match 'x$a' to 'y$b'.

To join by multiple variables, use a vector with length > 1. For example, 'by = c("a", "b")' will match 'x$a' to 'y$a' and 'x$b' to 'y$b'. Use a named vector to match different variables in 'x' and 'y'. For example, 'by = c("a" = "b", "c" = "d")' will match 'x$a' to 'y$b' and 'x$c' to 'y$d'.

on

(Optional) Indicate which columns in x should be joined with which columns in y. Examples included: 1..by = c("a","b") (this is a must for set_full_join); 2..by = c(x1="y1", x2="y2"); 3..by = c("x1==y1", "x2==y2"); 4..by = c("a", V2="b"); 5..by = .(a, b); 6..by = c("x>=a", "y<=b") or .by = .(x>=a, y<=b).

Value

A data.table

Examples


workers = fread("
    name company
    Nick Acme
    John Ajax
    Daniela Ajax
")

positions = fread("
    name position
    John designer
    Daniela engineer
    Cathie manager
")

workers %>% inner_join(positions)
workers %>% left_join(positions)
workers %>% right_join(positions)
workers %>% full_join(positions)

# filtering joins
workers %>% anti_join(positions)
workers %>% semi_join(positions)

# To suppress the message, supply 'by' argument
workers %>% left_join(positions, by = "name")

# Use a named 'by' if the join variables have different names
positions2 = setNames(positions, c("worker", "position")) # rename first column in 'positions'
workers %>% inner_join(positions2, by = c("name" = "worker"))

# the syntax of 'on' could be a bit different
workers %>% inner_join(positions2,on = "name==worker")



tidyft documentation built on Sept. 30, 2024, 9:40 a.m.