toyData: Small example data to show the features of dataMaid

Description Usage Format Source References Examples

Description

An artificial dataset, intended for presenting the key features of dataMaid, which is a toolset for identifying potential errors in a dataset.

Usage

1

Format

A data.frame with 15 rows and 6 variables.

pill

A factor variable with two levels ("red" and "blue") and a few (correctly coded) missing observations. This represents the colour of a pill.

events

A numeric variable with one obvious outlier value (82), two miscoded missing values (999 and NaN) and a few correctly coded missing values. The number of previous events.

region

A factor variable where two of the levels ("other" and "OTHER" are the same word with different case settings. Moreover, the variable includes a Stata-style miscoded missing value ("."). Used to represent geographical regions or treatment centers.

.

change

A numeric variable (random draws from a standard normal distribution). Representing a change in a measured variable.

id

A factor variable with unique codes for each observation (a character string with a number between 1 and 15), i.e. a key variable.

spotifysong

A factor variable that has the same level ("Irrelevant") for all observations, i.e. a empty variable. The latest song played on Spotify.

Source

Artificial data

References

Petersen AH, Ekstrøm CT (2019). “dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R.” _Journal of Statistical Software_, *90*(6), 1-38. doi: 10.18637/jss.v090.i06 ( doi: 10.18637/jss.v090.i06).

Examples

1

dataMaid documentation built on Oct. 8, 2021, 9:08 a.m.