README.md

dqLib

dqLib is an R package for data quality assessment and reporting. dqLib provides methods for calculating data quality metrics and generating reports on detected data quality issues, especially in CORD-MI.

Acknowledgement: This work was done within the “Collaboration on Rare Diseases” of the Medical Informatics Initiative (CORD-MI) funded by the German Federal Ministry of Education and Research (BMBF), under grant number: 01ZZ1911R, FKZ-01ZZ1911R

Data Quality Metrics and Reports

| Dimension | Data Quality Indicator Name | | ------------- | ------------- | | completeness | item completeness rate, value completeness rate, orphaCoding completeness rate | | plausibility | orphaCoding plausibility rate, range plausibility rate | | uniqueness |RD case unambiguity rate, RD case dissimilarity rate| | concordance |concordance of RD cases, concordance of tracer cases|

| No. | Data Quality Parameter Name | Description | |-----|--------------------------- | ------------| | P1 | missing data items | number of missing data items per year | | P2 | mandatory data items | number of mandatory items per year | | P3 | missing data values| number of missing data values per year | | P4 | available data values | number of available data values per year | | P5 | missing orphacodes | number of missing Orphacodes per year | | P6 | tracer diagnoses | number of tracer RD diagnoses per year | | P7 | implausible links | number of implausible code links per year | | P8 | checked for outliers | number of checked data values for outliers per year | | P9 | outliers | number of detected outliers per year | | P10 | ambigous RD cases | number of ambigous RD cases per year | | P11 | RD cases | number of RD cases per year | | P13 | duplicated RD cases | number of duplicated RD cases per year | | P14 | tracer cases | number of tracer RD cases per year | | P15 | inpatient cases | number of inpatient cases per year | | P16 | RD cases rel. frequency| relative frequency of inpatient RD cases per year | | P17 | tracer cases rel. frequency| relative frequency of inpatient tracer RD cases per year | | P18 | available cases | number of available cases per year | | P19 | available patients | number of available patients per year | | P20 | orphacodes | number of available orphacodes per year | | P21 | orpha-coded cases | number of available orpha-coded cases per year| | P22 | unambigous RD cases | number of unambigous RD cases per year |

Installation

You can install dqLib from local folder with:

devtools::install_local("./dqLib")

You can also install it directly from github with:

devtools::install_github("https://github.com/medizininformatik-initiative/dqLib")

Example

Here are examples for data quality analysis and reporting using this package - cordDQCheck.R for generating data quality reports in CORD-MI. - Here you can see the resulting files

Note

The default data quality dimensions are completeness, plausibility, uniqueness and concordance. Howerver, this framework allows the user to select desired quality dimensions and indicators as well as to generate user defined DQ reports.

To cite dqLib, please use the following BibTeX entry:

@software{Tahar_dqLib,
author = {Tahar, Kais},
title = {{dqLib}},
url = {https://github.com/KaisTahar/dqLib}
year = {2021}
}

See also: CORD-MI



medizininformatik-initiative/dqLib documentation built on Oct. 29, 2022, 11:18 p.m.