vignettes/abstracts/erum-2020.md

Storing all data related to a problem in a single table or data frame ("the dataset") can result in many repetitive values. Separation into multiple tables helps data quality but requires "merge" or "join" operations. {dm} is a new package that fills a gap in the R ecosystem: it makes working with multiple tables just as easy as working with a single table.

A "data model" consists of tables (both the definition and the data), and primary and foreign keys. The {dm} package combines these concepts with data manipulation powered by the tidyverse: entire data models are handled in a single entity, a "dm" object.

Three principal use cases for {dm} can be identified:

  1. When you consume a data model, {dm} helps access and manipulate a dataset consisting of multiple tables (database or local data frames) through a consistent interface.

  2. When you use a third-party dataset, {dm} helps normalizing the data to remove redundancies as part of the cleaning process.

  3. To create a relational data model, you can prepare the data using R and familiar tools and seamlessly export to a database.

The presentation revolves around these use cases and shows a few applications. The {dm} package is available on GitHub and will be submitted to CRAN in early February.



krlmlr/dm documentation built on April 19, 2024, 5:23 p.m.