multi_join: Join Multiple Data Frames In One Go

View source: R/multi_join.R

multi_joinR Documentation

Join Multiple Data Frames In One Go

Description

Join two or more data frames together in one operation. multi_join() can handle multiple different join methods and can join on differently named variables.

Usage

multi_join(
  data_frames,
  on,
  how = "left",
  keep_indicators = FALSE,
  monitor = FALSE
)

Arguments

data_frames

A list of data frames to join together. The second and all following data frames will be joined on the first one.

on

The key variables on which the data frames should be joined. If a character vector is provided, the function assumes all the variables are in every data frame. To join on different variable names a list of character vectors has to be provided.

how

A character vector containing the join method names. Available methods are: left, right, inner, full, outer, left_inner and right_inner.

keep_indicators

FALSE by default. If TRUE, a variable for each data frame is created, which indicates whether a data frame provides values.

monitor

FALSE by default. If TRUE, outputs two charts to visualize the functions time consumption.

Details

multi_join() is based on the 'SAS' Data-Step function Merge. Merge is capable of joining multiple data sets together at once, with a very basic syntax.

Provide the dataset names, the variables, on which they should be joined and after a full join is complete, the user can decide which parts of the joins should remain in the final dataset.

multi_join() tries to keep the simplicity, while giving the user the power, to do more joins at the same time. Additionally to what Merge can do, this function also makes use of the Proc SQL possibility to join datasets on different variable names.

Value

Returns a single data frame with joined variables from all given data frames.

Examples

# Example data frames
df1 <- data.frame(key = c(1, 1, 1, 2, 2, 2),
                  a   = c("a", "a", "a", "a", "a", "a"))

df2 <- data.frame(key = c(2, 3),
                  b   = c("b", "b"))

# See all different joins in action
join_methods <- c("left", "right", "inner", "full", "outer", "left_inner", "right_inner")
joined_data  <- list()

for (method in seq_along(join_methods)){
    joined_data[[method]] <- multi_join(list(df1, df2),
                                        on  = "key",
                                        how = join_methods[[method]])
}

# Left join on more than one key
df1b <- data.frame(key1 = c(1, 1, 1, 2, 2, 2),
                   key2 = c("a", "a", "a", "a", "a", "a"),
                   a    = c("a", "a", "a", "a", "a", "a"))

df2b <- data.frame(key1 = c(2, 3),
                   key2 = c("a", "a"),
                   b    = c("b", "b"))

left_joined <- multi_join(list(df1b, df2b), on = c("key1", "key2"))

# Join more than two data frames
df3 <- data.frame(key = c(1, 2),
                  c   = c("c", "c"))

multiple_joined <- multi_join(list(df1, df2, df3), on = "key")

# You can also use different methods for each join
multiple_joined2 <- multi_join(list(df1, df3, df2),
                               on  = "key",
                               how = c("left", "right"))

# Joining on different variable names
df1c <- data.frame(key1 = c(1, 1, 1, 2, 2, 2),
                   key2 = c("a", "a", "a", "a", "a", "a"),
                   a    = c("a", "a", "a", "a", "a", "a"))

df2c <- data.frame(var1 = c(2, 3),
                   var2 = c("a", "a"),
                   b    = c("b", "b"))

df3c <- data.frame(any  = c(1, 2),
                   name = c("a", "a"),
                   c    = c("c", "c"))

multiple_joined3 <- multi_join(list(df1c, df2c, df3c),
                               on = list(df1c = c("key1", "key2"),
                                         df2c = c("var1", "var2"),
                                         df3c = c("any", "name")))


qol documentation built on Dec. 14, 2025, 1:06 a.m.