| scores | R Documentation |
A set of functions to assess various aspects of data quality. including a comprehensive dataset score as well as individual scores for specific data quality dimensions such as date consistency, duplicates, recency, frequency, time, coding, comments, sources, missing values, and variables.
According to the literature, data quality can be assessed by checking for consistency, completeness, accuracy, timeliness, and uniqueness of the data. Consistency means that the data is logically coherent, completeness means that all required data is present, accuracy means that the data is correct and reliable, timeliness means that the data is up-to-date, and uniqueness means that there are no duplicate records.
score_dataset(df)
score_obs_no(df)
score_var_no(df)
score_completeness(df)
score_date_consistency(df)
score_date_scope(df)
score_obs_info(df, id_col = "ID")
score_coding(df)
score_comments(df)
score_var_info(df)
df |
A data frame to be scored. |
id_col |
The name of the column containing IDs. Default is "ID". |
These functions are designed to help assess the quality of data in a data frame. Each function checks a specific aspect of the data and returns a score or a message indicating the quality of that aspect. The functions include:
score_date_consistency: Proportion of invalid date pairs (End <= Begin).
score_duplicates: Proportion of duplicate IDs.
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5-34.
score_dataset(emperors)
score_obs_no(emperors)
score_var_no(emperors)
score_completeness(emperors)
score_date_consistency(emperors)
score_date_scope(emperors)
score_obs_info(emperors)
score_var_info(emperors)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.