Description Usage Arguments Value
This function performs probabilistic record linkage between all user-supplied consecutive snapshots of the voter file. Note that the default option is to exclude exact matches of all fields between two snapshots when performing the record linkage, for computational reasons. Note that for multiple matchings, the function uses a loop instead of more sophisticated measures such as purrr::map, because loading and wrangling them simultaneously will often times bring the machine crashing down.
1 2 3 4 5 6 7 8 9 | vrmatch(date_df, exact_exclude = TRUE, sample_exact = FALSE,
sample_id = FALSE, sample_size = NULL, sample_perc = NULL,
block = FALSE, path_clean = "clean_df", path_changes = "changes",
path_reports = "reports", path_matches = "matches",
clean_prefix = "df_cleaned_", clean_suffix = "",
exist_files = FALSE, varnames, varnames_str, varnames_num = NULL,
varnames_id = NULL, partial.match = NULL, varnames_block = NULL,
vars_change = NULL, n.cores = NULL, file_type = ".Rda",
date_label = "date_label", nrow = "nrow", seed = 123, ...)
|
date_df |
Dataframe of list of snapshots. |
exact_exclude |
Whether to exclude full exact matches between snapshots when doing probabilistic record linkage. Defaults to TRUE. |
sample_exact |
Whether to add random samples of full exact matches to correct for underlying population's value distributions for each field. Defaults to FALSE. |
sample_id |
Whether to add random samples of ID matches (some changes) to correct for underlying population's value distributions for each field. Defaults to FALSE. |
sample_size |
Sample size of the random sample to add. Defaults to NULL. |
sample_perc |
Sample percentage of the random sample to add. Defaults to NULL. If both 'sample_size' and 'sample_perc' are NULL, 'sample_perc' is set to 0.01 (1 'sample_perc' is chosen over 'sample_size'. |
block |
Whether to employ blocking. Defaults to FALSE. |
path_clean |
Path to the cleaned snapshots. Defaults to "clean_df". |
path_changes |
Path where the extracted changes are output to. Defaults to "changes". |
path_reports |
Path where the summarized changes are output to. Defaults to "reports". |
path_matches |
Path where the match outcomes are output to. Defaults to "matches". |
clean_prefix |
File prefixes for cleaned snapshots. Defaults to "df_cleaned_". |
clean_suffix |
File suffixes for cleaned snapshots. Defaults to empty string. |
exist_files |
Whether previously performed match outcomes exist. Defaults to FALSE. |
varnames |
Variables to perform probabilistic record linkage. |
varnames_str |
String variables for matching. |
varnames_num |
Numeric variables for matching. Defaults to NULL, in which case it will be setdiff(varnames, varnames_str). |
varnames_id |
Voter IDs variables, if any exists, and is to be excluded from PRL when IDs match. |
partial.match |
Variables to be partially matched. Defaults to all varnames_str. |
varnames_block |
Nested list of variables or their combinations for blocking passes. |
vars_change |
Variables to track changes of. Defaults to NULL, which will then track all variables. |
n.cores |
Number of cores to parallelize the matching. Defaults to half the existing threads. |
file_type |
Input file types. Defulats to .Rda. |
date_label |
Labels for dates (i.e., snapshot IDs), in 'date_df'. Defaults to "date_label". |
nrow |
Name of list element which will contain the number of rows of the input list dataframes. |
seed |
Seed to set. Defaults to 123. |
... |
Other parameters for fastLink. |
A nested list of matched dataframes, fastLink output, and arguments.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.