Description Usage Arguments Value Examples
View source: R/gen_linkage_file.R
damage_gold_standard
damage the gold_standard
file into a linkage files. The
damage actions are instructued by the error flags in syn_error_occurrence
. These
actions are:
missing: assign 'NA' to the flagged data point;
del: randomly delete one charater on the flagged data point;
trans_char: randomly transpose two neighbouring characters on the flagged data point;
trans_date: randomly transpose the day and the month of a date on the flagged data point;
insert: randomly insert one charater to the flagged data point;
typo: randomly assign a typo error to the flagged data point;
ocr: randomly assign a ocr error to the flagged data point;
pho: randomly assign a phonetic error to the flagged data point;
variant: randomly assign a name variant to the flagged data point.
1 | damage_gold_standard(gold_standard, syn_error_occurrence)
|
gold_standard |
A data frame of the gold standard dataset, see |
syn_error_occurrence |
A data frame of one-hot encoded error flags, see |
A list of two data frame: i) the linkage_file having the same dimension
as the gold_standard
but some of the variables are damaged; ii) the
error_log records the damages have made on the linkage file.
1 2 3 4 5 6 7 8 9 10 11 12 | adult_with_flag <- add_random_error(adult[1:50,], prob = c(0.97, 0.03), "age_missing")
adult_with_flag <- add_random_error(adult_with_flag, prob = c(0.65, 0.35), "firstname_variant")
adult_with_flag <- split_data(adult_with_flag, 70)
bn_evidence <- "age >=18 & capital_gain>=0 & capital_loss >=0 &
hours_per_week>=0 & hours_per_week<=100"
bn_learn <- gen_bn_learn(adult_with_flag$training_set, "hc", bn_evidence)
dataset_smaller_version <- bn_learn$gen_data
syn_dependent <- dataset_smaller_version[, !grepl("flag", colnames(dataset_smaller_version))]
gold_standard <- add_variable(syn_dependent, "firstname", country = "uk",
gender_dependency = TRUE, age_dependency = TRUE)
syn_error_occurrence <- bn_flag_inference(dataset_smaller_version, bn_learn$fit_model)
linkage_file <- damage_gold_standard(gold_standard, syn_error_occurrence)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.