combineDf: Combine data frames with different fields using a crosswalk...

Description Usage Arguments Value See Also Examples

View source: R/combineDf.r

Description

This function combines multiple data frames, possibly with different column names, into a single data frame. Usually merge will be faster and easier to implement if the columns to be merged on have the same names, and rbind will always be faster and much easier if the column names match exactly.

Usage

1
2
combineDf(..., crosswalk, sep = "; ", use = NULL, classes = NULL,
  verbose = FALSE)

Arguments

...

A list of data frames. If ignored, then paths and file names of data frames can be specified in crosswalk.

crosswalk

Data frame. Column names are fields desired in the output data frame. Each row corresponds to a different data frame to join. If ... is not used then the first column must have the paths and file names to CSV, RDS, or RData files representing data frames to join. Other than this column, the elements of each cell contain the name of the column in each data frame that coincides with the column name in the crosswalk table. For example, if the final output is to have a column by the name of "species" and "data frame #1" has a column named "Species" and "data frame #2" has a column named "scientificName", then the first value in crosswalk under its "species" column will be "Species" and the second "scientificName". More complex joining can be done using the following in cells of crosswalk:

  • _ at start of value: indicates the value in the crosswalk table will be read as text and repeated in the output in each row (minus the initial "_"). For example, "_inspected" will repeat the string "inspected" in every row of the output corresponding to the respective data frame.

  • 'c(~~~)': This will paste together fields in source data frame named in ... using the string specified in sep ("~~~" represents names of the respective data frame). Note that the entire string must be inside a single or double quotes as in 'c()' or "c()" and the columns named inside c() must be delineated by the other kind of quote (single if c() is delineated by double, and vice versa).

  • NA: Repeats NA.

sep

Character, specifies the string to put between fields combined with the c(~~~~) format in crosswalk.

use

Logical, Character, or NULL, if ... is used then this is a list of logical elements (TRUE or FALSE), or a column name of crosswalk with logical values indicating whether or not this particular data frame is to be collated, or NULL, in which case all data frames are used (default).

classes

Character or character list, specifies the classes (e.g., numeric, character) to be assigned to each column in the output table. If NULL, all classes will be assumed to be character. If just one value is listed, all columns will be set to this class. If a list, it must be the same length as the number of columns in crosswalk and specify the class of each column.

verbose

Logical, if TRUE prints extra information during execution. Useful for debugging the crosswalk table.

Value

A data frame.

See Also

merge, rbind

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
df1 <- data.frame(x1=1:5, x2='valid', x3=letters[1:5], x4=LETTERS[1:5], x5='stuff')
df2 <- data.frame(y1=11:15, y3=rev(letters)[1:5])

crosswalk <- data.frame(
  a=c('x1', 'y1'),
  b=c('x2', '_valid'),
  c=c('c("x3", "x4")', 'y3'),
  d=c('x5', NA)
)

out <- combineDf(df1, df2, crosswalk=crosswalk)
out

adamlilith/omnibus documentation built on Nov. 21, 2018, 11:01 a.m.