cust_dup_identify: Create a new (deduplicated) customer ID

Description Usage Arguments Details See Also

View source: R/finalize.R


Many states require a certain amount of extra customer ID deduping. This function changes cust_id to a deduped version and stores the original value in cust_id_raw. For every customer in which a duplicate(s) is found, the row with the lowest customer ID is used for the output cust_id.





input customer table


set of variables to be used for deduplication


We could (in theory) implement better customer deduplication using fuzzy matching. This would be more computationally difficult though; would need to limit the potential matches using a preprocessing step. Might also be overkill for our needs here.

See Also

Other finalize production data: cust_dup_demo_plot, cust_dup_demo, cust_dup_pct, cust_dup_pull, cust_dup_year, res_id

southwick-associates/salicprep documentation built on Dec. 18, 2019, 6:45 a.m.