knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dcmodify) library(dcmodifydb)
The goal of dcmodifydb
is to apply modification rules specified with dcmodify
on a database table, allowing for documented, reproducable data cleaning adjustments
in a database.
This document provides examples and common error corrections scenario's for which
dcmodify
can be used.
A good practice is to store the intent and description of each rule: this enables
future (co)workers on the dataset to evaluate the use of each rule.
It is therefore recommended to store rules in a yaml
format and document each
rule accordingly.
m <- modifier( if (year < 25) year = year + 2000)
m <- modifier( if (year < 100){ if (year > 25) { year = year + 1900 } else { year = year + 2000 } })
A value is not measured (NA
), but can be deduced to be of a certain value, e.g. zero. Suppose a health questionair contains questions if a person smokes and if so how many cigarretes per day.
Typically if the answer to the first question is "no", the second question is not asked an thus "unmeasured", but can be deduced to be 0.
d <- data.frame(smokes=c(TRUE, FALSE), cigarettes=c(10,NA)) knitr::kable(d)
if (smokes == FALSE) cigarretes = 0
m <- modifier(if (smokes == FALSE) { cigarretes = 0 }) dc <- modify(d, m) knitr::kable(dc)
if
rulem <- modifier( if (age < 12) income = 0)
The following statements are equivalent. It is wise to choose a syntax that is familiar to the persons specifying the correction rules.
```{eval=FALSE} if (age < 12) income = 0 # R noobs if (age < 12) income <- 0 # a bit more R-y income[age < 12] <- 0 # very R-y
### `else` Each `if` rule may be followed with an `else` or `else if` ```r m <- modifier(if (age > 67) {retired = TRUE} else {retired = FALSE})
The following statements are equivalent. It is wise to choose a syntax that is familiar to the persons specifying the correction rules.
```{eval=FALSE} if (age > 67) {retired = TRUE} else {retired = FALSE} # R noobs if (age > 67) {retired <- TRUE} else {retired <- FALSE} # R-y retired <- if (age > 67) TRUE else FALSE # very R-y retired <- age > 67 # very R-y
### multiple assignments ```r m <- modifier( if (age > 67) { retired = TRUE salary = 0 } )
else if
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.