add_dependent_error: Add two dependent error flags to a data frame.

Description Usage Arguments Value Examples

View source: R/gen_gold_standard.R

Description

add_dependent_error adds two column of dependent error flags (between 0 and 1) to a data frame.

Usage

1
2
3
4
5
6
add_dependent_error(
  dataset,
  error_names,
  prior_probs = c(0.5, 0.5),
  cond_probs = c(0.95, 0.05, 0.85, 0.15)
)

Arguments

dataset

A data frame of the dataset.

error_names

A string of the variable names and type of the error in the form of 'variable 1_variable 2_error type'. The error of variable 2 depends on the error of varable 1. The error type can be either: 'missing', 'insert', 'variant', 'typo', 'pho', 'ocr', 'trans_date' or 'trans_char'.

prior_probs

A vector of two numerical probablities, where the first one is the prior probablity of variable 1 being 0 (no error) and the second one is the prior probablity of variable 1 being 1 (having error).

cond_probs

A vector of four numerical probablities, where the first two probablities are the probablities of variable 2 being 0 and 1 given variable 1 being 0, and the last two are the probablities of variable 2 being 0 and 1 given variable 1 being 1.

Value

A data frame of the dataset with two additional dependent column of binary encoded error.

Examples

1
2
3
4
adult_with_flag <- add_dependent_error(adult[1:100,], "race_sex_typo")
adult_with_flag <- add_dependent_error(adult[1:100,], "age_sex_missing",
                                       prior_probs = c(0.99, 0.01),
                                       cond_probs = c(0.95, 0.05, 0.4, 0.6))

sdglinkage documentation built on April 27, 2020, 5:09 p.m.