compare_df_cols: Generate a comparison of data.frames (or similar objects)...

Description Usage Arguments Details Value See Also Examples

View source: R/compare_df_cols.R

Description

Generate a comparison of data.frames (or similar objects) that indicates if they will successfully bind together by rows.

Usage

1
2
3
4
5
6
compare_df_cols(
  ...,
  return = c("all", "match", "mismatch"),
  bind_method = c("bind_rows", "rbind"),
  strict_description = FALSE
)

Arguments

...

A combination of data.frames, tibbles, and lists of data.frames/tibbles. The values may optionally be named arguments; if named, the output column will be the name; if not named, the output column will be the data.frame name (see examples section).

return

Should a summary of "all" columns be returned, only return "match"ing columns, or only "mismatch"ing columns?

bind_method

What method of binding should be used to determine matches? With "bind_rows", columns missing from a data.frame would be considered a match (as in dplyr::bind_rows(); with "rbind", columns missing from a data.frame would be considered a mismatch (as in base::rbind().

strict_description

Passed to describe_class. Also, see the Details section.

Details

Due to the returned "column_name" column, no input data.frame may be named "column_name".

The strict_description argument is most typically used to understand if factor levels match or are bindable. Factors are typically bindable, but the behavior of what happens when they bind differs based on the binding method ("bind_rows" or "rbind"). Even when strict_description is FALSE, data.frames may still bind because some classes (like factors and characters) can bind even if they appear to differ.

Value

A data.frame with a column named "column_name" with a value named after the input data.frames' column names, and then one column per data.frame (named after the input data.frame). If more than one input has the same column name, the column naming will have suffixes defined by sequential use of base::merge() and may differ from expected naming. The rows within the data.frame-named columns are descriptions of the classes of the data within the columns (generated by describe_class).

See Also

Other Data frame type comparison: compare_df_cols_same(), describe_class()

Examples

1
2
3
4
5
compare_df_cols(data.frame(A=1), data.frame(B=2))
# user-defined names
compare_df_cols(dfA=data.frame(A=1), dfB=data.frame(B=2))
# a combination of list and data.frame input
compare_df_cols(listA=list(dfA=data.frame(A=1), dfB=data.frame(B=2)), data.frame(A=3))

Example output

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test

  column_name data.frame(A = 1) data.frame(B = 2)
1           A           numeric              <NA>
2           B              <NA>           numeric
  column_name     dfA     dfB
1           A numeric    <NA>
2           B    <NA> numeric
  column_name     dfA     dfB data.frame(A = 3)
1           A numeric    <NA>           numeric
2           B    <NA> numeric              <NA>

janitor documentation built on Jan. 5, 2021, 9:07 a.m.