adjust_vrmatch: Adjust for Duplicates and False Negatives for All Snapshots

Description Usage Arguments Details Value

View source: R/adjust_vrmatch.R

Description

This function applies and 'adjust_fn' and 'adjust_dups' to all snapshots specified, and exports the adjusted match, corresponding changes, and summaries of these changes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
adjust_vrmatch(dedup_ids = c("lVoterUniqueID", "sAffNumber"),
  fn_ids = c("lVoterUniqueID", "sAffNumber"), adj_prefix = "adj_",
  adj_suffix = "", date_df = NULL, start = "2018-04-26",
  end = "2021-01-01", path = "7z",
  pattern = "^(?=.*Cntywd_)(?!.*Hist)", file_type_snapshot = ".txt",
  file_type_cleaned = ".Rda", format = "%m%d%y",
  recursive = FALSE, period = 1, file_prefix = "Cntywd_",
  path_changes = "changes", path_reports = "reports",
  path_matches = "matches", vars_change = NULL,
  date_label = "date_label", nrow = "nrow", path_clean = "clean_df",
  clean_prefix = "df_cleaned_", clean_suffix = "")

Arguments

dedup_ids

The ID variables to correct for duplicates. Default is c("lVoterUniqueID", "sAffNumber").

fn_ids

The ID variables to correct for false negatives. Default is c("lVoterUniqueID", "sAffNumber").

adj_prefix

File prefix for saving deduped objects for all matches, changes, and reports. Defaults to "dedup_". If set to empty string as well as adj_suffix, this will overwrite the existing pre-deduplication outputs, and this must be done with caution.

adj_suffix

File suffix for saving deduped objects.

date_df

List of snapshots. Defaults to NULL, in which case the function will detect all snapshots available.

start

The start date of the first snapshot. Defaults to April 26, 2018.

end

The end date of the last snapshot. Defaults to Jan 1, 2021.

path

Path where all snapshots are stored. Defaults to subfolder 7z.

pattern

Regular expression of the file pattern to find. Defaults to a particular pattern of OCROV files.

file_type_snapshot

File type for 'snapshot_list()'. Defaults to .txt.

file_type_cleaned

File type for 'clean_import()'. Defaults to .Rda.

format

Format of the date in the snapshot file names. Defaults to "%m%d%y".

recursive

Whether to find files recursively. Defaults to FALSE.

period

Period/interval between each snapshot— whether daily, weekly, and so on. Defaults to 1 (equivalent to "day"). Any valid input for base seq.Date by argument is allowed.

file_prefix

File name file_prefix. Defaults to Cntywd_.

path_changes

Path where the extracted changes are output to. Defaults to "changes".

path_reports

Path where the summarized changes are output to. Defaults to "reports".

path_matches

Path where the match outcomes are output to. Defaults to "matches".

vars_change

Variables to track changes of. Defaults to NULL, which will then track all variables.

date_label

Labels for dates (i.e., snapshot IDs), in 'date_df'. Defaults to "date_label".

nrow

Name of list element which will contain the number of rows of the input list dataframes.

path_clean

Path to the cleaned snapshots. Defaults to "clean_df".

clean_prefix

File prefixes for cleaned snapshots. Defaults to "df_cleaned_".

clean_suffix

File suffixes for cleaned snapshots. Defaults to empty string.

Details

The function should not be pre-applied if the user intends to do a performance evaluation.

Value

A named list of dataframes similar to vrmatch output but with perfect duplicates by the matching variables cleaned.


sysilviakim/voterdiffR documentation built on June 22, 2020, 6:51 p.m.