ops: Operator Syntax for Joining Keyed Data Frames

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

These functions implement a concise syntax for joining objects of class c('keyed', 'data frame'). + produces an outer join, & produces an inner join, and | gives a left join. * produces a column-stable outer join (same columns and order as for x).

By default, - drops rows in x that have matching rows in y. / drops rows in x that do not have matching rows in y.

^ gives a serial left join: x is joined cumulatively with static variants of y, using left subsets of the key (e.g. subj, subj-time, subj-time-cmt, etc.).

!, a unary operator, returns rows in x that have NA keys or duplicate/duplicated keys.

Previous versions supported proxy methods, such as plus.keyed, but these have been removed for simplicity.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## S3 method for class 'keyed'
e1 + e2
## S3 method for class 'keyed'
e1 * e2
## S3 method for class 'keyed'
e1 & e2
## S3 method for class 'keyed'
e1 - e2
## S3 method for class 'keyed'
e1 / e2
## S3 method for class 'keyed'
e1 | e2
## S3 method for class 'keyed'
e1 ^ e2

Arguments

e1

left argument to Ops

e2

right argument to Ops

Details

A concise syntax for joining of data.frames facilitates dynamic assembly of data. This system leverages existing operators and dispatch mechanisms. Under Ops dispatch rules, if both left and right operands resolve to the same the method, that method is used. Operator methods are already defined for data.frame, but the existence of class ‘keyed’ creates an opportunity for syntax specification.

Operators have been chosen to coordinate intuition about their effects with existing operator precedence and a general data assembly pattern. + gives an outer join (merge, with all=TRUE). * further restricts to columns on the left (star suggests stable). & gives an inner join (all=FALSE): mnemonically, rows must match in the left AND right operands to contribute to the result. - suggests removal: it is used for methods that drop rows. | gives a left join (merge, all.x=TRUE). Mnemonically, it suggests conditioning (as with formulas): use of rows on the right is conditional on existence of matches on the left. Right joins are currently not implemented, but most can be expressed as left joins by rearrangement. ! suggests “not”: rows are returned that have problematic keys (NA or duplicates), i.e. not keyed correctly.

Operators +,*,/ and - have higher precedence than &,| and !. Within groups, operators have equal precedence and resolve left to right (see ?S3groupGeneric). A common assembly sequence is one or more full joins followed by one or more left joins. Correspondence to the existing order of operations minimizes the need for parenthetical grouping of terms (which is available nonetheless).

Value

keyed data.frame

Author(s)

Tim Bergsma

References

http://metrumrg.googlecode.com

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
dose <- data.frame( 
	SUBJ = rep(letters[1:3], each = 2), 
	HOUR = rep(c(0,20),3), 
	AMT = rep(c(40,60,80), each = 2) 
) 
dose <- as.keyed(dose,key=c('SUBJ','HOUR'))
samp <- data.frame( 
	SUBJ = rep(letters[1:3], each = 4), 
	HOUR = rep(c(0,10,20,30),3), 
	DV = signif(rnorm(12),2) + 2 
) 
samp <- as.keyed(samp,key=c('SUBJ','HOUR'))
demo <- data.frame( 
	SUBJ = letters[2:5], 
	RACE = c('asian','white','black','other'), 
	SEX = c('female','male','female','male'), 
	WT = c(75, 70, 73, 68) 
)
demo <- as.keyed(demo,key=c('SUBJ'))
meds <- as.keyed(
	data.frame(
		SUBJ=c('a','c'),
		HOUR=c(0,15),
		STOP=c(10,25),
		C3A4=as.flag(c(1,1))
	),
	key=c('SUBJ','HOUR')
)

dose + samp
dose * samp
dose & samp
samp - dose
samp / dose
dose | demo
demo | dose
demo ^ dose

a <- data.frame(
  subj=c(1,1,1,2,2,2,3,3,3),
  time=c(0,1,2,0,1,2,0,1,2),
  conc=c(1,2,3,1,2,3,1,2,3)
)
a <- as.keyed(a,c('subj','time'))

b <-data.frame(
  subj=c(1,1,2,2,3,3),
  time=c(0,2,0,2,0,2),
  conc=c(1,3,1,3,1,3),
  crcl=c(5,5,6,6,7,7),
  pred=c(2,4,2,4,2,4)
)
b <- as.keyed(b,c('subj','time'))
a|b
a^b # note imputation of apparently-constant crcl
	

metrumrg documentation built on May 2, 2019, 5:55 p.m.