merge-methods: Computing a join on two tables

merge-methodR Documentation

Computing a join on two tables

Description

This method is equivalent to a database join on two tables, and the merge can be by common column or row names. It supports the equivalent of inner, left-outer, right-outer, and full-outer join operations. This method is similar to merge.data.frame.

Usage

## S4 method for signature 'db.obj,db.obj'
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
key = x@.key, suffixes = c("_x","_y"), ...)

Arguments

x,y

The signature of the method. Both argument are db.obj objects, and their associated tables will be merged.

by,by.x,by.y

specifications of the columns used for merging. See 'Details'.

all

logical; all = L is shorthand for all.x = L and all.y = L, where L is either TRUE or FALSE.

all.x

logical; if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output.

all.y

logical; analogous to all.x.

key

specifies the primary key of the newly created table.

suffixes

a character vector of length 2 specifying the suffixes to be used for making unique the names of columns in the result which not used for merging (appearing in by etc).

...

arguments to be passed to or from methods.

Details

See merge.data.frame. Note that merge.data.frame supports an incomparables argument, which is not yet supported here.

Value

A db.Rquery object, which expresses the join operation.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

See Also

merge.data.frame a merge operation for two data frames.

Examples

## Not run: 


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create sample databases
authors <- data.frame(
    surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
    nationality = c("US", "Australia", "US", "UK", "Australia"),
    deceased = c("yes", rep("no", 4)))

books <- data.frame(
    name = I(c("Tukey", "Venables", "Tierney",
             "Ripley", "Ripley", "McNeil", "R Core")),
    title = c("Exploratory Data Analysis",
              "Modern Applied Statistics ...",
              "LISP-STAT",
              "Spatial Statistics", "Stochastic Simulation",
              "Interactive Data Analysis",
              "An Introduction to R"),
    other.author = c(NA, "Ripley", NA, NA, NA, NA,
                     "Venables & Smith"))

delete("books", conn.id = cid)
delete("authors", conn.id = cid)
as.db.data.frame(books, 'books', conn.id = cid, verbose = FALSE)
as.db.data.frame(authors, 'authors', conn.id = cid, verbose = FALSE)

## Cast them as db.data.frame objects
a <- db.data.frame('authors', conn.id = cid, verbose = FALSE)
b <- db.data.frame('books', conn.id = cid, verbose = FALSE)

## Merge them together
m1 <- merge(a, b, by.x = "surname", by.y = "name", all = TRUE)

db.disconnect(cid, verbose = FALSE)

## End(Not run)

greenplum-db/GreenplumR documentation built on Sept. 2, 2023, 8:09 a.m.