merge.ffdf: Merge two ffdf by common columns, or do other versions of...

Description Usage Arguments Details Value See Also Examples

View source: R/merge.R

Description

Merge two ffdf by common columns, or do other versions of database join operations. This method is similar to merge in the base package but only allows inner and left outer joins. Note that joining is done based on ffmatch or ffdfmatch: only the first element in y will be added to x; and since ffdfmatch works by paste-ing together a key, this might not be suited if your key contains columns of vmode double.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## S3 method for class 'ffdf'
merge(
  x,
  y,
  by = intersect(names(x), names(y)),
  by.x = by,
  by.y = by,
  all = FALSE,
  all.x = all,
  all.y = all,
  sort = FALSE,
  suffixes = c(".x", ".y"),
  incomparables = NULL,
  trace = FALSE,
  ...
)

Arguments

x

an ffdf

y

an ffdf

by

specifications of the common columns. Columns can be specified by name, number or by a logical vector.

by.x

specifications of the common columns of the x ffdf, overruling the by parameter

by.y

specifications of the common columns of the y ffdf, overruling the by parameter

all

see merge in R base

all.x

if TRUE, then extra rows will be added to the output, one for each row in x that has no matching row in y. These rows will have NAs in those columns that are usually filled with values from y. The default is FALSE, so that only rows with data from both x and y are included in the output.

all.y

similar as all.x

sort

logical, currently not used yet, defaults to FALSE.

suffixes

character(2) specifying the suffixes to be used for making non-by names() unique.

incomparables

values which cannot be matched. See match. Currently not used.

trace

logical indicating to show on which chunk the function is computing

...

other options passed on to ffdfindexget

Details

If a left outer join is performed and no matching record in x is found in y, columns with vmodes 'boolean', 'quad', 'nibble', 'ubyte', 'ushort' are coerced respectively to vmode 'logical', 'byte', 'byte', 'short', 'integer' to allow NA values.

Value

an ffdf

See Also

merge

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
authors <- data.frame(
    surname = c("Tukey", "Venables", "Tierney", "Ripley", "McNeil"),
    nationality = c("US", "Australia", "US", "UK", "Australia"),
    deceased = c("yes", rep("no", 4)), stringsAsFactors = TRUE)
books <- data.frame(
    name = c("Tukey", "Venables", "Tierney",
             "Ripley", "Ripley", "McNeil", "R Core"),
    title = c("Exploratory Data Analysis",
              "Modern Applied Statistics ...",
              "LISP-STAT",
              "Spatial Statistics", "Stochastic Simulation",
              "Interactive Data Analysis",
              "An Introduction to R"),
    other.author = c(NA, "Ripley", NA, NA, NA, NA,
                     "Venables & Smith"), stringsAsFactors = TRUE)
books <- lapply(1:100, FUN=function(x, books){
	books$price <- rnorm(nrow(books))
	books
}, books=books)
books <- do.call(rbind, books)
authors <- as.ffdf(authors)                
books <- as.ffdf(books)

dim(books)
dim(authors)
## Inner join
oldffbatchbytes <- getOption("ffbatchbytes")
options(ffbatchbytes = 100)
m1 <- merge( books, authors, by.x = "name", by.y = "surname"
           , all.x=FALSE, all.y=FALSE, trace = TRUE)
dim(m1)
unique(paste(m1$name[], m1$nationality[]))
unique(paste(m1$name[], m1$deceased[]))
m2 <- merge( books[,], authors[,], by.x = "name", by.y = "surname"
           , all.x=FALSE, all.y=FALSE, sort = FALSE)
dim(m2)
unique(paste(m2$name[], m2$nationality[]))
unique(paste(m2$name[], m2$deceased[]))
## Left outer join
m1 <- merge( books, authors, by.x = "name", by.y = "surname"
           , all.x=TRUE, all.y=FALSE, trace = TRUE)
class(m1)
dim(m1)
names(books)
names(m1)
unique(paste(m1$name[], m1$nationality[]))
unique(paste(m1$name[], m1$deceased[]))

authors$test <- ff(TRUE, length=nrow(authors), vmode = "logical")
m1 <- merge( books, authors, by.x = "name", by.y = "surname"
           , all.x=TRUE, all.y=FALSE, trace = TRUE)
vmode(m1$test)
table(m1$test[], exclude=c())
options(ffbatchbytes = oldffbatchbytes)

Example output

Loading required package: ff
Loading required package: bit

Attaching package:bitThe following object is masked frompackage:base:

    xor

Attaching package ff
- getOption("fftempdir")=="/work/tmp/tmp/RtmpqeUxEv/ff"

- getOption("ffextension")=="ff"

- getOption("ffdrop")==TRUE

- getOption("fffinonexit")==TRUE

- getOption("ffpagesize")==65536

- getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes

- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system

- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system


Attaching package:ffThe following objects are masked frompackage:utils:

    write.csv, write.csv2

The following objects are masked frompackage:base:

    is.factor, is.ordered

Registered S3 methods overwritten by 'ffbase':
  method   from
  [.ff     ff  
  [.ffdf   ff  
  [<-.ff   ff  
  [<-.ffdf ff  

Attaching package:ffbaseThe following objects are masked frompackage:base:

    %in%, table

[1] 700   4
[1] 5 3
2021-05-23 13:24:05, x has 28 chunks, table has 1 chunks
2021-05-23 13:24:05, working on x chunk 1:25
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 26:50
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 51:75
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 76:100
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 101:125
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 126:150
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 151:175
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 176:200
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 201:225
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 226:250
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 251:275
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 276:300
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 301:325
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 326:350
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 351:375
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 376:400
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 401:425
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 426:450
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 451:475
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 476:500
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 501:525
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 526:550
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 551:575
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 576:600
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 601:625
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 626:650
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 651:675
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 676:700
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, found match indexes, now starting to add y to x
[1] 600   6
[1] "Tukey US"           "Venables Australia" "Tierney US"        
[4] "Ripley UK"          "McNeil Australia"  
[1] "Tukey yes"   "Venables no" "Tierney no"  "Ripley no"   "McNeil no"  
[1] 600   6
[1] "Tukey US"           "Venables Australia" "Tierney US"        
[4] "Ripley UK"          "McNeil Australia"  
[1] "Tukey yes"   "Venables no" "Tierney no"  "Ripley no"   "McNeil no"  
2021-05-23 13:24:05, x has 28 chunks, table has 1 chunks
2021-05-23 13:24:05, working on x chunk 1:25
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 26:50
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 51:75
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 76:100
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 101:125
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 126:150
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 151:175
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 176:200
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 201:225
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 226:250
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 251:275
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 276:300
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 301:325
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 326:350
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 351:375
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 376:400
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 401:425
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 426:450
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 451:475
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 476:500
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 501:525
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 526:550
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 551:575
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 576:600
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 601:625
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 626:650
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 651:675
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 676:700
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, found match indexes, now starting to add y to x and coercing if needed
[1] "ffdf"
[1] 700   6
[1] "name"         "title"        "other.author" "price"       
[1] "name"         "title"        "other.author" "price"        "nationality" 
[6] "deceased"    
[1] "Tukey US"           "Venables Australia" "Tierney US"        
[4] "Ripley UK"          "McNeil Australia"   "R Core NA"         
[1] "Tukey yes"   "Venables no" "Tierney no"  "Ripley no"   "McNeil no"  
[6] "R Core NA"  
2021-05-23 13:24:05, x has 28 chunks, table has 1 chunks
2021-05-23 13:24:05, working on x chunk 1:25
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 26:50
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 51:75
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 76:100
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 101:125
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 126:150
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 151:175
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 176:200
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 201:225
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 226:250
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 251:275
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 276:300
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 301:325
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 326:350
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 351:375
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 376:400
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 401:425
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 426:450
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 451:475
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 476:500
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 501:525
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 526:550
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 551:575
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 576:600
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 601:625
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 626:650
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 651:675
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, working on x chunk 676:700
2021-05-23 13:24:05, working on table chunk 1:5
2021-05-23 13:24:05, found match indexes, now starting to add y to x and coercing if needed
[1] "logical"

TRUE <NA> 
 600  100 

ffbase documentation built on Feb. 27, 2021, 5:06 p.m.