knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
) 

dcmodifydb

CRAN status R-CMD-check Downloads Codecov test coverage

The goal of dcmodifydb is to apply modification rules specified with dcmodify on a database table, allowing for documented, reproducable data cleaning adjustments in a database.

dcmodify separates intent from execution: a user specifies what, why and how of an automatic data change and uses dcmodifydb to execute them on a tbl database table.

Installation

The development version from GitHub can be installed with:

# install.packages("devtools")
devtools::install_github("data-cleaning/dcmodifydb")

Example


Documented rules

library(DBI)
library(dcmodify)
library(dcmodifydb)
con <- dbConnect(RSQLite::SQLite())

You can use YAML to store the modification rules: "example.yml"

```r
Let's load the rules and apply them to a data set:

```r
m <- modifier(.file = "example.yml")
m <- modifier(.file = "example/example.yml")
print(m)
# setup the data
"age, income
  11,   2000
 150,    300
  25,   2000
 -10,   2000
" -> csv
income <- read.csv(text = csv, strip.white = TRUE)
dbWriteTable(con, "income", income)
tbl_income <- dplyr::tbl(con, "income")

# this is the table in the data base
tbl_income

# and now after modification
modify(tbl_income, m, copy = FALSE) 

Generated sql can be written with dump_sql

dump_sql(m, tbl_income, file = "modify.sql")

modify.sql:

```r
dump_sql(m, tbl_income)
```r
dbDisconnect(con)

Note: Modification rules can be written to yaml with as_yaml and export_yaml.

dcmodify::export_yaml(m, "cleaning_steps.yml")


data-cleaning/dcmodifydb documentation built on June 23, 2022, 10:34 p.m.