collate_columns: Collate Columns Based on Content

View source: R/collate_columns.R

collate_columnsR Documentation

Collate Columns Based on Content

Description

After compose_cells, this function rearranges and rename attribute-columns in order to make columns properly aligned, based on the content of the columns.

Usage

collate_columns(
  composed_data,
  combine_threshold = 1,
  rest_cols = Inf,
  retain_other_cols = FALSE,
  retain_cell_address = FALSE
)

Arguments

composed_data

output of compose_cells (preferably not processed)

combine_threshold

a numerical threshold (between 0-1) for content-based collation of columns. (Default 1)

rest_cols

number of rest columns (beyond combine_threshold joins these many numbers of columns to keep)

retain_other_cols

whether to keep other intermediate (and possibly not so important) columns. (Default FALSE)

retain_cell_address

whether to keep columns like (row, col, data_block). This may be required for traceback (Default FALSE)

Details

  • Dependency on stringdist: If you have stringdist installed, the approximate string matching will be enhanced. There may be variations in outcome if you have stringdist vs if you don't have it.

  • Possibility of randomness: If the attribute column is containing many distinct values, then a column representative sample will be drawn. Hence it is always recommended to set.seed if reproducibility is a matter of concern.

Value

A column collated data.frame


r-rudra/tidycells documentation built on July 19, 2022, 5:10 a.m.