acquire_error_flag: Add a column of error flags given two data frames.

Description Usage Arguments Value Examples

View source: R/acquire_error.R

Description

compare_two_df compares the vars of data frames given an uniqueId.

Usage

1
acquire_error_flag(df1, diffs.table, var_name, error_type)

Arguments

df1

Data frame 1.

diffs.table

A data frame of differnces between two data frames given by compare_two_df.

var_name

A string of variable name that we want to check if there is error.

error_type

A string of error type name:

  1. missing: if the value of var_name is NA in df2, it will be flagged as 1, otherwise, 0;

  2. del: if the value of var_name in df2 equals to var_name in df1 with a letter being deleted (see get_transformation_del), it will be flagged as 1, otherwise, 0;

  3. trans_char: if the value of var_name in df2 equals to var_name in df1 with two of its letters' position being transposed (see get_transformation_trans_char), it will be flagged as 1, otherwise, 0;

  4. trans_date: if the value of var_name in df2 equals to var_name in df1 with day and month being transposed (see get_transformation_trans_date), it will be flagged as 1, otherwise, 0;

  5. insert: if the value of var_name in df2 equals to var_name in df1 with an additional letter being inserted (see get_transformation_insert), it will be flagged as 1, otherwise, 0;

  6. typo: if the value of var_name in df2 equals to var_name in df1 with a typo error (see get_transformation_typo), it will be flagged as 1, otherwise, 0;

  7. ocr: if the value of var_name in df2 equals to var_name in df1 with an ocr error (see get_transformation_ocr), it will be flagged as 1, otherwise, 0;

  8. pho: if the value of var_name in df2 equals to var_name in df1 with a phonetic error (see get_transformation_pho), it will be flagged as 1, otherwise, 0;

  9. variant: if the value of var_name in df2 equals to a variant of var_name in df1 (see get_transformation_name_variant), it will be flagged as 1, otherwise, 0;

Value

It returns a data frame of df1 with an additional error flag column called var_name.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
df <- data.frame(firstname_variant=character(20), lastname_variant=character(20))
df <- add_variable(df, "nhsid")
df <- add_variable(df, "firstname", country = "uk", gender_dependency= FALSE,
                   age_dependency = FALSE)
df <- add_variable(df, "lastname", country = "uk", gender_dependency= FALSE,
                   age_dependency = FALSE)
df$firstname_variant <-as.character(df$firstname_variant)
df$lastname_variant <-as.character(df$lastname_variant)
for (i in 1:nrow(df)){
  df$firstname_variant[i] = strsplit(get_transformation_name_variant(df$firstname[i]), ',')[[1]][1]
  df$lastname_variant[i] = strsplit(get_transformation_name_variant(df$lastname[i]), ',')[[1]][1]
}
df1 = df[c('nhsid', 'firstname', 'lastname')]
df2 = df[c('nhsid', 'firstname_variant', 'lastname_variant')]
df2[1:3, 'firstname_variant'] = NA
vars = list(c('firstname', 'firstname_variant'), c('lastname', 'lastname_variant'))
diffs.table = compare_two_df(df1, df2, vars, 'nhsid')
df1_with_flags = acquire_error_flag(df1, diffs.table, 'firstname', 'missing')
df1_with_flags = acquire_error_flag(df1_with_flags, diffs.table, 'firstname', 'variant')
df1_with_flags = acquire_error_flag(df1_with_flags, diffs.table, 'firstname', 'pho')
df1_with_flags = acquire_error_flag(df1_with_flags, diffs.table, 'firstname', 'ocr')

sdglinkage documentation built on April 27, 2020, 5:09 p.m.