vignettes/abstracts/user-2019.md

Complex datasets are often organized as several tables, interlinked through common values in columns. Working with these datasets requires an understanding of the relationships between tables. Occasionally, reorganization such as normalization, decomposition, or aggregation may be necessary, in particular if the data comes as a single denormalized table. Defining a rigorous data model helps, but we found surprisingly few tools in the R ecosystem that deal with this problem. One example is the datamodelr package on GitHub, which focuses on describing data models and drawing diagrams.

The dm package offers tools to prepare, specify and validate a data model (e.g., decomposing tables, defining and checking keys and cardinality). It focuses on both data and metadata, and can work with any dplyr-compatible data source, in particular in-memory data frames and SQL databases are supported. With a validated data model, join operations are well- defined and can be generated automatically. An interface to datamodelr for drawing the data model diagram is provided. A Shiny module for browsing and filtering the tables of the data model or a submodel, respecting the relationships between the tables, makes it easy to build custom apps for interactive work with data spread across multiple tables.



krlmlr/dm documentation built on April 19, 2024, 5:23 p.m.