Derive Data rules


Derive Data Rules

An important task and skill in data cleaning is to get to "know" your data. This process takes considerable time and often the resulting knowledge is fragmented. Fragmented in the individual observations that are different from implicit expectations, leading to anecdotical knowledge, but also fragmented across members of a data science team.

An alternative approach is to develop an explicit set of data validation rules. Rules that state when data is valid for your analysis. This approach is part of the validate package suite.

Many of the validate related packages do not consider the process of creating the rule sets, they simply assume the rules are available and can be used to check, correct or impute your data.

deriverules is about creating data validation rule set for new data sets: it aims to help to bootstrap the rule finding process.

Working ideas

data-cleaning/deriverules documentation built on May 17, 2019, 7:03 p.m.