Correction scenarios

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dcmodify)
library(dcmodifydb)

Introduction

The goal of dcmodifydb is to apply modification rules specified with dcmodify on a database table, allowing for documented, reproducable data cleaning adjustments in a database.

This document provides examples and common error corrections scenario's for which dcmodify can be used.

Document rules

A good practice is to store the intent and description of each rule: this enables future (co)workers on the dataset to evaluate the use of each rule. It is therefore recommended to store rules in a yaml format and document each rule accordingly.

Error scenario's

Replacing common mistakes

m <- modifier( if (year < 25) year = year + 2000)
m <- modifier( if (year < 100){
  if (year > 25) {
    year = year + 1900
  } else {
    year = year + 2000
  }
})

Replacing missing values

A value is not measured (NA), but can be deduced to be of a certain value, e.g. zero. Suppose a health questionair contains questions if a person smokes and if so how many cigarretes per day. Typically if the answer to the first question is "no", the second question is not asked an thus "unmeasured", but can be deduced to be 0.

d <- data.frame(smokes=c(TRUE, FALSE), cigarettes=c(10,NA))
knitr::kable(d)
if (smokes ==  FALSE) cigarretes = 0
m <- modifier(if (smokes ==  FALSE) {
  cigarretes = 0
})
dc <- modify(d, m)
knitr::kable(dc)

Recoding

SQL translation

Syntax

if rule

m <- modifier( if (age < 12) income = 0)

The following statements are equivalent. It is wise to choose a syntax that is familiar to the persons specifying the correction rules.

```{eval=FALSE} if (age < 12) income = 0 # R noobs if (age < 12) income <- 0 # a bit more R-y income[age < 12] <- 0 # very R-y

### `else`

Each `if` rule may be followed with an `else` or `else if`

```r
m <- modifier(if (age > 67) {retired = TRUE} else {retired = FALSE})

The following statements are equivalent. It is wise to choose a syntax that is familiar to the persons specifying the correction rules.

```{eval=FALSE} if (age > 67) {retired = TRUE} else {retired = FALSE} # R noobs if (age > 67) {retired <- TRUE} else {retired <- FALSE} # R-y retired <- if (age > 67) TRUE else FALSE # very R-y retired <- age > 67 # very R-y

### multiple assignments

```r
m <- modifier( 
  if (age > 67) {
    retired = TRUE
    salary = 0 
  }
)

else if



Try the dcmodifydb package in your browser

Any scripts or data that you put into this service are public.

dcmodifydb documentation built on June 17, 2022, 5:05 p.m.