stata_join: Join two data frames Stata-style

Description Usage Arguments Examples

View source: R/stata_join.R

Description

Joins corresponding observations from x (the master data frame) and y (the using data frame), matching on one or more variables (specified using by). stata_join() builds on full_join() from the dplyr package (which needs to be loaded) to implement joins in a manner that mimics the standard merge function in Stata.

The join type - one of "1:1", "m:1", "1:m" or "m:m" - must be specified. When one of "1:1", "m:1" or "1:m" is selected, an error is returned if by does not uniquely identify observations in the master/using data frames (where relevant). A new variable, merge, is created and added to the returned data frame. This variable contains information on whether the new observations are the product of a match or not: 1 = master data only, 2 = using data only, 3 = match. A summary table of this new variable is also returned.

Usage

1
stata_join(x, y, type, by, keepusing = NULL)

Arguments

x

master data frame.

y

using data frame.

type

one of "1:1", "m:1", "1:m" or "m:m". This argument is required, no default is specified.

by

a character vector of variables to join by. This argument is required, no default is specified. Joining by different variables on x and y is not currently supported.

keepusing

a character vector of variables from the using data frame to keep in the joined data frame. If NULL, the default, all variables are kept.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Example 1 -------------------------------------------------------------------

library(dplyr)
library(nycflights13)

data(flights, airlines)
flights
airlines

merged_df <- stata_join(flights, airlines, type = "m:1", by = "carrier")


# Example 2 -------------------------------------------------------------------

library(dplyr)

USArrests <- mutate(USArrests, USState = rownames(USArrests))

state.x77 <- state.x77 %>%
  as.data.frame() %>%
  mutate(USState = rownames(state.x77)) %>%
  select(USState, everything())

merged_df <- stata_join(state.x77,
                        USArrests,
                        type = "1:1",
                        by = "USState",
                        keepusing = c("Assault", "Rape"))

james-e-thomas/statajoin documentation built on Aug. 20, 2020, 7:24 a.m.