Rudimentary checks to flag potentially problematic values in a dataset
Functionality
inspect_all()
: applies all checks listed below
find_duplicates()
looks for duplicates in columns that should be unique
find_duplicates_uuid()
looks for a function containing "uuid" in the name and looks for duplicates
find_outliers()
looks for outliers in numerical columns
find_other_responses()
looks for values in potential "if other, specify.." type of columns
Output The output has the same structure for all functions of this package: A data frame with the following columns: value variable has_issue issue_type
index
: the index of the row of the original data in which the issue occured. (NA if applies to multiple rows)
value
the suspicious value
variable
The column containing the suspicious value
has_issue
logical TRUE/FALSE: currently always true, can be ignored.
issue_type
a description/name of the potential issue
Limitations
These checks are under no circumstances sufficient or complete: more context and data specific checks are always necessary.
Any data protection related checks are rudimentary at best and are under no circumstances sufficient to ensure data protection in any way
Although tempting, this package should never be used to automatically remove any flagged values without double checking them manually. This would seriously skew your dataset, and make uncertainty estimates invalid. Potential issues must be investigated on a case by case basis.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.