joinSQLDataFrame: join 'SQLDataFrame' together

Description Usage Arguments Details Value Examples

Description

*_join functions for SQLDataFrame objects. Will preserve the duplicate rows for the input argument 'x'.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## S3 method for class 'SQLDataFrame'
left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

## S3 method for class 'SQLDataFrame'
inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...)

## S3 method for class 'SQLDataFrame'
semi_join(x, y, by = NULL, copy = FALSE, ...)

## S3 method for class 'SQLDataFrame'
anti_join(x, y, by = NULL, copy = FALSE, ...)

Arguments

x

SQLDataFrame objects to join.

y

SQLDataFrame objects to join.

by

A character vector of variables to join by. If ‘NULL’, the default, ‘*_join()’ will do a natural join, using all variables with common names across the two tables. See ?dplyr::join for details.

copy

Only kept for S3 generic/method consistency. Used as "copy = FALSE" internally and not modifiable.

suffix

A character vector of length 2 specify the suffixes to be added if there are non-joined duplicate variables in ‘x’ and ‘y’. Default values are ".x" and ".y".See ?dplyr::join for details.

...

Other arguments passed on to *_join methods. localConn can be passed here for one MySQL connection with write permission. Will be used only when join-ing two SQLDataFrame objects from different MySQL connections (to different MySQL databases), and neither has write permission. The situation is rare and should be avoided. See Details.

Details

The *_join functions support aggregation of SQLDataFrame objects from same or different connection (e.g., cross databases), either with or without write permission.

SQLite database tables are supported by SQLDataFrame package, in the same/cross-database aggregation, and saving.

For MySQL databases, There are different situations:

When the input SQLDataFrame objects connects to same remote MySQL database without write permission (e.g., ensembl), the functions work like dbplyr with the lazy operations and a DataFrame interface. Note that the aggregated SQLDataFrame can not be saved using saveSQLDataFrame.

When the input SQLDataFrame objects connects to different MySQL databases, and neither has write permission, the *_join functions are supported but will be quite time consuming. To avoid this situation, a more efficient way is to save the database table in local MySQL server using saveSQLDataFrame, and then call the *_join functions again.

More frequent situation will be the *_join operation on two SQLDataFrame objects, of which at least one has write permission. Then the cross-database aggregation through SQLDataFrame package will be supported by generating federated table from the non-writable connection in the writable connection. Look for MySQL database manual for more details.

Value

A SQLDataFrame object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
test.db1 <- system.file("extdata/test.db", package = "SQLDataFrame")
test.db2 <- system.file("extdata/test1.db", package = "SQLDataFrame")
con1 <- DBI::dbConnect(DBI::dbDriver("SQLite"), dbname = test.db1)
con2 <- DBI::dbConnect(DBI::dbDriver("SQLite"), dbname = test.db2)
obj1 <- SQLDataFrame(conn = con1,
                     dbtable = "state",
                     dbkey = c("region", "population"))
obj2 <- SQLDataFrame(conn = con2,
                     dbtable = "state1",
                     dbkey = c("region", "population"))

obj1_sub <- obj1[1:10, 1:2]
obj2_sub <- obj2[8:15, 2:3]

left_join(obj1_sub, obj2_sub)
inner_join(obj1_sub, obj2_sub)
semi_join(obj1_sub, obj2_sub)
anti_join(obj1_sub, obj2_sub)

SQLDataFrame documentation built on Nov. 29, 2020, 2:01 a.m.