Using the datimvalidation package

This package is provided for use by the PEPFAR Data Exchange community in order to provide more stringent validation of data payloads which are intended to be imported into DATIM. To help data importers to prepare their files for submission to DATIM, this R package has been created to help validate the data prior to importing it into the system. This package will help you to validate your data payload and validate it against the business logic of DATIM, prior to attempting to import it into the system. This approach is useful, as it will allow you to interactively correct mistakes in the payload prior to submission of the data for import.

These scripts, to a large extent, emulate the logic of the server. Basic functions are exposed to allow you to determine if your data contains invalid data elements, incorrect disaggregations, and inactive mechanisms. Each function is documented in separate sections of the package.

DATIM is based on the DHIS2 software, and is therefore capable of importing different types of data, including CSV, JSON, XML as well as ADX formats. Users wishing to use the data import capabilities of DATIM, should therefore familiarize themselves with the various formats which DHIS2 supports and the syntax of each of these. The DHIS2 development team has provided extensive technical documentation on the various formats here.

In addition to the general DHIS2 documentation, a series of DATIM specific resources is available here. These guides provide more detailed information on the data import process into DATIM, including code listings for indicators and disaggregations.

Getting started

In order to get started, you will need to have installed the R packages "datimvalidation" and "datimutils".

You will need an active internet connection and an active DATIM user name in order to utilize the datimvalidation package. Meta-data will be retrieved from the DATIM server, using your user name and password , and stored in a local cache. Once the objects are cached, the package can be used off-line, until the cache is invalidated. By default, cached objects are stored for a week and then invalidated.

The "datimvalidation" package relies on "datimutils" to handle authentication with DATIM. There are several different ways which you can authenticate. Please refer to the "datimutils" package for details on setting up a means to connect your local R environment to the DATIM server.

options(scipen=999) #Turn of any scientific notation
require(datimutils)
require(datimvalidation)
loginToDATIM(config_path = "/home/littebobbytables/.secrets/datim.json")

You should not store your credentials in a plain text file if you machine is not encrypted. The better solution is to utilize the keyring functionality of datimutils. Consult the documentation of that package for how to use it.

Parsing your data

Next, we will use the d2Parser function, which is a general purpose function to load the different formats of data which DATIM accepts. This function should work with CSV, JSON, and XML files, which have been formatted according to the technical specifications of DHIS2.

You will also need to know the identification scheme of the file. The easiest way is to open the file in a text editor, and see what id scheme the payload is using. This could be 'code','id' or 'shortName'. Consult the specific function documentation for more information on id schemes.

d <- d2Parser(file=filename,
              type = "csv",
              datastream =  "MER",
              dataElementIdScheme = "code",
              orgUnitIdScheme = "id",
              idScheme = "id")

Note , this operation could take several minutes, so please be patient. If the parsing of your data was successful, you should not receive any errors.

Working with the validation object.

The d2Parser function produces an object composed of various lists. We will not describe the structure of this object (assuming that it was named d as in the previous example.

Mechanism validation

Now, we have loaded our data payload into the object d, we can proceed to validate it. The checkMechanismValidity function will check that all mechanisms present in the payload are currently active in DATIM. Data values must associated with a period whose start date and end date are within the COP planning periods for which the mechanism is active.

#Check for valid mechanisms and simply print those which are not valid
checkMechanismValidity(d)

Data element and disaggregation validation

Depending on the type of payload you are submitting, there may be multiple data sets which are bundled together. There are strict controls in DATIM regarding the submission of data elements to given organisaion units. For instance, community data should not be submitted for facility sites, and vice versa. It can be difficult to determine for a given data payload whether certain data should be submitted for a given

Next, we can check the indicators and disaggregations for validity against the current dataset form definitions. A data frame of indicator/disagg combinations will be returned. The 'dataset' paramater should be either the exact name of the dataset, or a grouping name like "MER Results" of

checkDataElementDisaggValidity(d)

Data element/period validation

Certain MER data elements are collected only in certain quarters. Quarterly data elements are collected each quarter, SAPR data elements are collected in quarters 2 and 4, and APR data elements are collected only in quarter 4. In order to ensure that no data values are submitted for the wrong periods, you can check the "cadence" of the data elements to be sure that only the correct data elements are submitted for a particular time period.

#Check for valid data element-period combinations and  print those which are not valid
checkDataElementCadence(d)

Value type validation

DATIM has strict controls on the type of data which can be imported into each data element. For instance, it is not possible to import a data value specified as "11/02/2015" as a date, because DATIM expects the date to be specified in "YYYY-MM-DD" format. In this case, the data would need to be transmitted as "2015-11-02". Using the checkValueTypeCompliance function, it is possible to validate a given data payload against the value types which have been specified for each data element. As an example, we can simply invoked the checkValueTypeCompliance, with our parsed data (d), along with the URL, username and password. Only data which does not meet the specified value type will be returned.

#Check for indicator / disagg combinations and save the result as a CSV file
checkValueTypeCompliance(d)

Validation rule validation

We can perform a check of the validation rules as defined in DATIM on the data payload.

The validateData function is capable of performing the validation operation in parallel. To activate this, you must inform R of the number of parallel sessions which may be spawned. Usually, taking the number of cores on your system is a good starting point.You can modify the following command based on the number of system cores you have access to. Consult the plyr documentation for specifics on parallel operations.

For users of the datimvalidation package on Linux, you may be able to decrease the amount of time it takes to perform the validation, which can be a very lengthy process, depending on the amount of data and speed of your machine.

Use the following command to inform that you would like to spawn four parallel processes. This operation is only supported on the Linux operating system.

doMC::registerDoMC(cores=4) # or however many cores you have access to
parallel=TRUE

In order to run the validation rule check, simply invoke the following command.

#Run the validation rule and save the output as a CSV file
vr_violations <- validateData(data = d,
                            return_violations_only = TRUE,
                            parallel =parallel )

A list of validation rule violations will be returned to the vr_violationsobject.

A complete example

A basic script which combines all of the validation checks is supplied below.

You should of course adjust the specifics of the script, such as where your credentials are stored. If your security parameters do not allow you to store credentials on your machine, then you should leave the loadSecrets function blank and enter them in the dialog. You can change the datafile variable to point to your import file, and then run the the script in your R environment.

If any of the functions return errors, warnings, or non-empty data frames, you should investigate and resolve the issues prior to submitting your data file for import into DATIM.

require(datimutils)
require(datimvalidation)
loginToDATIM(config_path = "/home/littebobbytables/.secrets/datim.json")
datafile <- "/home/littebobbytables/mydata.csv"
d <- d2Parser(filename = datafile ,type = "csv")
checkDataElementOrgunitValidity(data = d,organisationUnit = "f5RoebaDLMx",datasets = c("zUoy5hk8r0q","KWRj80vEfHU"))
checkValueTypeCompliance(d)
checkNegativeValues(d)
validateData(data = d, datasets = c("zUoy5hk8r0q","KWRj80vEfHU"),
 return_violations_only = TRUE,
 parallel = FALSE )


jason-p-pickering/datim-validation documentation built on April 20, 2023, 5:32 a.m.