join.tbl_ffdf: Join ffdf tbls.

Description Usage Arguments Examples

Description

See join for a description of the general purpose of the functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## S3 method for class 'ffdf'
inner_join(x, y, by = NULL, copy = FALSE, ...)

## S3 method for class 'ffdf'
left_join(x, y, by = NULL, copy = FALSE, ...)

## S3 method for class 'ffdf'
semi_join(x, y, by = NULL, ...)

## S3 method for class 'ffdf'
anti_join(x, y, by = NULL, ...)

Arguments

x,y

tbls to join

by

a character vector of variables to join by. If NULL, the default, join will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right.

To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.

...

Included for compatibility with generic; otherwise ignored.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
if (require(Lahman)){
data("Batting", package = "Lahman")
data("Master", package = "Lahman")

batting_ffdf <- tbl_ffdf(Batting)
person_ffdf <- tbl_ffdf(Master)

# Inner join: match batting and person data
inner_join(batting_ffdf, person_ffdf)

# Left join: keep batting data even if person missing
left_join(batting_ffdf, person_ffdf)

# Semi-join: find batting data for top 4 teams, 2010:2012
grid <- expand.grid(
  teamID = c("WAS", "ATL", "PHI", "NYA"),
  yearID = 2010:2012)
top4 <- semi_join(batting_ffdf, grid, copy = TRUE)

# Anti-join: find batting data with out player data
anti_join(batting_ffdf, person_ffdf)
}

edwindj/ffbase2 documentation built on May 15, 2019, 11:05 p.m.