Output Structure

If you run the write_reports or write_reports_local function with nexamples set to anything larger than 0, there are three types of .xlsx files that will be written:

I will discuss each of these three documents in turn.

State Summary

This document has four different tabs: required nulls, optional nulls, invalids, and visit-arrival lag. Required and optional refers to if these fields are required to be reported or optional.

The exact specifics of what is needed will likely vary by jurisdiction; for this reason, I made sure to annotate the code to make it clear what I'm doing at each point of the functions.

Required and Optional Nulls

We used the PHIN messaging guide from the CDC to determine what was required or optional, but also relied on our own needs at Kansas Department of Health and Environment.

Here is an example of what the nulls tabs look like:



Columns A and B have the facility's name and ID number, respectively (I changed them to protect facility privacy for this document). Column C points out what type of measure is being used: the raw count of nulls? Or the percentage that are null? Columns D and on are the fields that are being checked. For instance, Column F shows us that Facility E (ID 5) did not report the age in five visits, which accounts for only about a tenth of one percent of the total visits. Rows 1 and 2 show the state counts and percents, respectively. So cell E2 shows us that—across the state—about half of the visits never included admit reason information. This is the setup for both the required and optional null tabs. The first three rows and first three columns are frozen to make it easier to navigate the page.

Invalids

Invalids were determined using the PHIN VADS value sets, but also sometimes relied on our own judgment. For instance, we considered certain fields like age and temperature to be invalid if age and temperature units are missing. The precise details of these can be found by looking at the documentation for the functions ending with _invalid.

Here is an example of what the invalids tab looks like:



The layout is the same as the null tabs, except now it it displays when fields are invalid. You may see that there are some gaps: This is to be expected. I exclude all nulls (that is, nulls are not considered invalid, since there is already a function that measures nulls), so if all examples of this field was null, then there is nothing to count! This means a denominator of zero for the percentages; a blank is just put here.

Visit-Arrival Lag

For each facility, this tab displays the average time between (a) when the patient first visited the facility and (b) when the first record for this visit arrived at the BioSense Platform. here is an example of what this tab looks like:



This shows us that, for example, Facility I (ID 9) has an average length of 18.54 hours of lag between when the patient visited and when the record arrived (cell C10). Note that the metric is always in hours.

Facility Summary

The function will return one .xlsx file for each facility that is in the data you pulled. This will have four tabs: facility information, required nulls, optional nulls, and invalids.

Facility Information

Here is an example of what the tab looks like, with information about the facility masked with "XXXXXXX":



The rows each list different summary information about the facility. Column A will list the HL7 segments that this information is taken from (the guide for these HL7 segments are found at the CDC's NSSP data dictionary), while Column B lists the field name, and Column C lists the value. Note that the dates for messages represented in this file are at the bottom; the range for both when patients visited and when the records arrived to the BioSense Platform are listed.

Nulls and Invalids

Here is an example of what each of the nulls and invalid tabs will look like (I used invalids as an example here):



The HL7 segment that these fields are taken from is in Column A, Field checked is in Column B, that specific facility's counts and percents are in Columns C and D, and the statewide percent is in Column E. The invalids will include specific cases, too, where the "field" name isn't exactly the field name that comes up in the BioSense Platform. For instance, "FPID_MRN_Mismatch" is the number of times that the first patient ID did not match the medical record number. Similarly, "Missing_Death_Given_Discharge_Disposition" is the number of times that the discharge disposition said a patient died, but no death indicator was given.

Facility Examples

If you set nexamples as greater than zero, you will also get an .xlsx file for every single facility. This file has two tabs: invalids and nulls. I did not post a screenshot here, as there is generally a lot of visit-specific information in the columns it returns.

For the invalids tab, Column A contains the patient-visit ID number (C_BioSense_ID), Column B contains the field that is invalid, and Column C contains the entry that is invalid. If you want to know why it is invalid, the documentation for all of the *_invalid functions talk about what specifically is valid or invalid. Again, we use the PHIN VADS value sets and what we wanted for KDHE as determinants of what is invalid. This is likely to change by jurisdiction; I sacrificed some speed in the code for readability, so it should be doable to edit the code as need be for your department's specific needs.

The rest of the columns (E through Q) gives you information needed to find the record and visit in question, in case the facility needs to track down what happened where in the data pipeline.

The nulls tab is almost the same. Columns A and B are the visit ID and null field, respectively; the rest of the columns (C to P) are the same columns that give you information needed to find the record and visit in question.

Any questions about interpretability can be addressed to markhwhiteii@gmail.com. You can also report bugs and issues on GitHub.



SophiaLC/IHI_DQ documentation built on Oct. 23, 2020, 2:35 a.m.