combineDf: Combine data frames with different fields using a crosswalk...
In adamlilith/omnibus: Helper Tools for Managing Data, Dates, Missing Values, and Text

combineDf

R Documentation

Combine data frames with different fields using a crosswalk table

Description

This function combines multiple "source" data frames, possibly with different column names, into a single "destination" data frame. Usually merge will be faster and easier to implement if the columns to be merged on have the same names, and rbind will always be faster and much easier if the column names and data types match exactly.

The key tool in this function is a "crosswalk" table (a data.frame) that tells the function which fields in each source data frame match to the final fields in the destination data frame. Values in source data frame fields can be used as-is, combined across fields, or have functions applied to them before they are put into the destination data frame. If a source data frame doe snot gave a field that matches the destination field, a default value (including NA) can be assigned to all cells for that source data frame.

The data frames to be combined can be provided in ... or as file names in the first column of the crosswalk table. These can be either CSV files (extension ".csv"), TAB files (extension ".tab"), "Rdata" files (read using load and with a ".rda" or ".rdata" extension), or "RDS" files (read using readRDS and with a ".rds" extension). The file type will be intuited from the extension, and its case does not matter. Note that if an object in an Rdata file has the same name as an object in this function (i.e., any of the arguments plus any objects internal to the function), this may cause a conflict. To help obviate this issue, all internal objects are named with a period at the end (e.g., "crossCell." and "countDf.").

All cells in each source data frame will have leading and trailing white spaces removed before combining.

Usage

combineDf(
  ...,
  crosswalk,
  collapse = "; ",
  useColumns = NULL,
  excludeColumns = NULL,
  useFrames = NULL,
  classes = NULL,
  verbose = FALSE
)

Arguments

`...`	Data frames to combine. These must be listed in the order that they appear in the `crosswalk` table.
`crosswalk`	A `data.frame`. Column names are fields in the destination data frame. Each row corresponds to a different data frame to join. If `...` is not used then the first column must have the paths and file names to CSV, TAB, Rdata, or RDS files representing data frames to join. All objects will be coerced to `data.frame`s. Other than this column, the elements of each cell contain the name of the column in each source data frame that coincides with the column name in the `crosswalk` table. For example, suppose the destination data frame is to have a column by the name of "species" with names of species within it. If the first source data frame has a column named "Species" and the second source data frame has a column named "scientificName", then the first value in `crosswalk` under its "species" column will be "Species" and the second "scientificName". This will take all the values in the "Species" field of the first source data frame and all the values in the "scientificName" field in the second source data frame and put them both into the "species" field of the destination frame. More complex operations can be done using the following in cells of `crosswalk`: Filling all cells with a single value: If the value in the crosswalk cell begins with "`%fill%`", then the value that follows it will be repeated in destination data frame in each row. For example, `%fill% inspected` will repeat the string "inspected" in every row of the output corresponding to the respective source data frame (any spaces immediately after `%fill%` are ignored). Concatenating (pasting) columns together: To combine multiple fields, begin crosswalk cell with "`%cat%`", then list the fields to combine (with or without commas separating them). For example, if the crosswalk cell has `%cat% field1 field2 field3` then the resulting column in the destination data frame will have values from `field1`, `field2`, and `field3` pasted together. See also argument `collapse`. Applying a function: You can manipulate values by including functions in the crosswalk cell. The crosswalk cell should begin with "`%fun%`", then be followed an expression to evaluate. Expressions should generally not use the "`<-`" operator (or the equals operator used in the same way). For example: `%fun% ifelse(as.numeric(field1) >20, NA, as.numeric(field1))` will create a column that is `NA` whenever values in `field1` are >20, and the value in `field1` otherwise. Note that for mathematical operations, it is almost always necessary to use `as.numeric` around column names representing numbers since fields are read in as characters.
`collapse`	Character, specifies the string to put between fields combined with the `%cat%` operator in the `crosswalk` table.
`useColumns`, `excludeColumns`	Logical, character vector, integer vector, or `NULL`. Indicates which columns in the crosswalk table are to be used or not to be used. These can be given as a `TRUE`/`FALSE` vector, a vector of column names, or a vector of column index values (integers). You can only specify `useColumns` or `excludeColumns` at a time (one or both must be `NULL`). If both are `NULL` (default), then all columns in the crosswalk will be used.
`useFrames`	Logical, character, or `NULL`. Indicates if a particular source data frame should be used. This should be a `TRUE`/`FALSE` vector or the name of a column in the crosswalk table that has `TRUE`/`FALSE` values. If this is the name of a column, the column will be removed from the columns in `useColumns`. If `NULL` (default). then all data frames in the crosswalk will be used.
`classes`	Character or character list, specifies the classes (e.g., character, logical, numeric, integer) to be assigned to each column in the output table. If `NULL`, all classes will be assumed to be character. If just one value is listed, all columns will be set to this class. If a list, it must be the same length as the number of columns in `crosswalk` and specify the class of each column. If it has names, then the names must correspond to the columns in `crosswalk` and will be used to assign the data type to the corresponding columns.
`verbose`	Logical, if `TRUE` prints extra information during execution. Useful for debugging the `crosswalk` table.

Value

A data frame.

Examples

df1 <- data.frame(x1=1:5, x2=FALSE, x3=letters[1:5], x4=LETTERS[1:5],
       x5='stuff', x6=11:15)
df2 <- data.frame(y1=11:15, y2=rev(letters)[1:5], y3=runif(5))

crosswalk <- data.frame(
  a = c('x1', 'y1'),
  b = c('x2', '%fill% TRUE'),
  c = c('%cat% x3 x4', 'y2'),
  d = c('x5', '%fill% NA'),
 e = c('%fun% as.numeric(x6) > 12', '%fun% round(as.numeric(y3), 2)')
)

combined <- combineDf(df1, df2, crosswalk=crosswalk)
combined

adamlilith/omnibus documentation built on June 10, 2025, 10:33 p.m.

adamlilith/omnibus index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

adamlilith/omnibus
Helper Tools for Managing Data, Dates, Missing Values, and Text

combineDf: Combine data frames with different fields using a crosswalk...
In adamlilith/omnibus: Helper Tools for Managing Data, Dates, Missing Values, and Text

Combine data frames with different fields using a crosswalk table

Description

Usage

Arguments

Value

See Also

Examples

Related to combineDf in adamlilith/omnibus...

R Package Documentation

Browse R Packages

We want your feedback!

adamlilith/omnibus Helper Tools for Managing Data, Dates, Missing Values, and Text

combineDf: Combine data frames with different fields using a crosswalk... In adamlilith/omnibus: Helper Tools for Managing Data, Dates, Missing Values, and Text

Combine data frames with different fields using a crosswalk table

Description

Usage

Arguments

Value

See Also

Examples

Related to combineDf in adamlilith/omnibus...

R Package Documentation

Browse R Packages

We want your feedback!

adamlilith/omnibus
Helper Tools for Managing Data, Dates, Missing Values, and Text

combineDf: Combine data frames with different fields using a crosswalk...
In adamlilith/omnibus: Helper Tools for Managing Data, Dates, Missing Values, and Text