fix_all_match_sets: Fix match (list of source-match sets).

Description Usage Arguments Details Value

Description

Uses a list of source-match collections to update values throughout dataframe.

Usage

1
2
fix_all_match_sets(match_set_list, term_df, clean_col, flat_col, start_reg,
  end_reg, split_reg)

Arguments

match_set_list

A list whose element names are flat term that will be retained whose sub-elements are the match terms to be replaced in the target columns.

term_df

The dataframe to be updated.

clean_col

String with the name of the column "clean" term values.

flat_col

String with the name of the column with "flat" term values.

start_reg,

end_reg Strings providing the regular expressions that can be used to identify the start and and of a term.

split_reg

String with regex identifying the pattern to split on when fields are complex.

Details

This takes collection of source-match lists and updates all the "flat" and "clean" fields containing the match terms with the respective source terms. This is designed to support interaction with complex fields where the elements are identified by provided regex. In these cases, the matching element of the field will be replaced and the rest of the field left untouched.

NOTE: "flat" refers to the version of the term used during the matching process. "clean" refers to the reader-friendly version of the term.

Value

Returns an updated version of the passed-in dataframe.


datavores/vgsample documentation built on May 14, 2019, 8:59 p.m.