clean_snapshot: Clean Snapshots and Store Them

Description Usage Arguments Value

View source: R/clean_snapshot.R

Description

This function takes requested snapshots, import them, clean them, and export them into Rda and/or fst objects for future calls by vrmatch function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
clean_snapshot(
  date_df = NULL,
  start = "2018-04-26",
  end = "2021-01-01",
  path = "7z",
  pattern = "^(?=.*Cntywd_)(?!.*Hist)",
  file_type = ".txt",
  path_clean = "clean_df",
  clean_prefix = "df_cleaned_",
  clean_suffix = "",
  save_type = c("rda", "fst"),
  format = "%m%d%y",
  recursive = FALSE,
  period = 1,
  file_prefix = "Cntywd_",
  varnames = NULL,
  date = NULL,
  date_order = "mdy",
  num = NULL,
  first = "szNameFirst",
  voter_prefix = "sVoterTitle",
  gender = "sGender",
  email = "szEmailAddress",
  email_exc = c("abc@example.com"),
  phone = "szPhone",
  phone_exc = "___-____",
  ...
)

Arguments

date_df

List of snapshots. Defaults to NULL, in which case the function will detect all snapshots available.

start

The start date of the first snapshot. Defaults to April 26, 2018.

end

The end date of the last snapshot. Defaults to Jan 1, 2021.

path

Path where all snapshots are stored. Defaults to subfolder 7z.

pattern

Regular expression of the file pattern to find. Defaults to a particular pattern of OCROV files.

file_type

File type. Defaults to .txt.

path_clean

Path where cleaned snapshots would be stored. Defaults to "clean_df".

clean_prefix

File prefixes for cleaned snapshots. This replaces the existing file prefix. Defaults to "df_cleaned_".

clean_suffix

File suffixes for cleaned snapshots. Defaults to empty string.

save_type

How to export the cleaned dataframe. Defaults to Rda and fst.

format

Format of the date in the snapshot file names. Defaults to "%m%d%y".

recursive

Whether to find files recursively. Defaults to FALSE.

period

Period/interval between each snapshot— whether daily, weekly, and so on. Defaults to 1 (equivalent to "day"). Any valid input for base seq.Date by argument is allowed.

file_prefix

File name prefix. Defaults to Cntywd_.

varnames

All variables to be cleaned. Defaults to NULL.

date

Date variables. Defaults to NULL.

date_order

Order of the date variable, if string format.

num

Numeric variables. Defaults to NULL.

first

Variable containing first names. Defaults to "szNameFirst".

voter_prefix

Variable containing self-reported personal prefixes. Defaults to "sVoterTitle".

gender

Variable containing original gender entry. Defaults to "sGender".

email

Name of the email address field. Defaults to "szEmailAddress".

email_exc

Emails that are to be cleaned. Defaults to a single vector of abc at example.com

phone

Name of the phone number field. Defaults to "szPhone".

phone_exc

Phone numbers that are to be cleaned. Defaults to "___-____".

...

Other arguments to be passed to snapshot_import.

Value

Output dataframe with cleaned contacts.


sysilviakim/voterdiffR documentation built on June 22, 2020, 6:51 p.m.