knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction/Background

The main goal of hospEpi is to provide useful functions for epidemiological analyses in a hospital setting. While many think of disease as coming from the outside to the inside of a hospital, some diseases (hospital-acquired infections) actually occur during patient stays, making it important for there to be teams who are ready to analyze and stop the spread before more are infected.

The motivation behind this package came during my time as an intern for the Indiana University Health Infection Prevention team, which is made up of healthcare workers who sort of play the roles of hospital epidemiologists (among many other things). My supervisor, Josh Sadowski, is one of the data analysts on the team, so many of these functions can aid him in the future (and hopefully many others, too!). Some examples of what this package does are that it allows people to easily study patient location history/network data and also disease-exposure/disease-risk factor data.

As of now, there are multiple other epidemiology R packages, like epiR and epitools. While those are very useful for epidemiologists, this package works a little differently, as it focuses more on applying tools for analyses in a hospital setting. The functions could certainly be adapted for use outside of a hospital, but the functions might be most useful in a hospital. It also addresses a few challenges that make it stand out as its own project. Those are:

As seen, while there are other packages for epidemiological analyses, this package adds other functionality that might be helpful. While initially created for working with hospital data, it could easily be used for other epidemiological analyses. It is not an all-encompassing package, so other epidemiology packages should be checked out, but this package does provide a couple useful tools that are not provided elsewhere.

How to Use hospEpi

After installing the package, you can load it like this:

library(hospEpi)

As of now, there are two main use cases for this package: analyzing patient location/hospital network data and analyzing disease-exposure/disease-risk factor data. Below is a walk-through of the process of doing each.

Working With Patient Location/Hospital Network Data

A good way to start analyzing hospital network data using this package is to look at the example dataset that comes with the package and to go through the examples using it to see if you need to manipulate your data before using the functions. The example data can be loaded in like so:

hn_data <- hosp_network_data
head(hn_data)

To use many of the functions, the data needs to have columns corresponding to the patient's starting room and next room or starting unit and next unit. Looking at the data above, we do not have those columns yet, so let's use the cleaning function to get everything in the right format.

cleaned_hn_data <- clean_hosp_network(data = hn_data, uniqueID = UniqueEncountID, startDate = BeginDate, 
                                      endDate = EndDate, unitName = UnitName, 
                                      roomNum = RoomNumber)

head(cleaned_hn_data[,4:9])

Now that we have clean data, we can move on to creating an object of class hosp_network, which will allow the plotting and summary functions to be used. To create the object, you will use the below function.

hn_object <- hosp_network(x = cleaned_hn_data, fromUnit = UnitName, toUnit = 
                            next_unit, fromRoom = RoomNumber, toRoom = next_room)
class(hn_object)

If you do not have one of the two types of data, room or unit, then you can exclude it like so, as to use the rest of the functions, you just need to make sure to have at least one of the two:

hn_object2 <- hosp_network(x = cleaned_hn_data, fromUnit = UnitName, toUnit = next_unit)

Now, you can move on to plotting the data. This package creates network graphs using the igraph package, which, for example, would allow you to see the network of your patients and how they have moved throughout the hospital. Two examples are:

plot(hn_object, by = "room", type = "simple")
plot(hn_object, by = "unit", type = "hub score", vertex.color = "red", vertex.shape = "square")

They might be hard to read right now, but that is only because they are plotted next to each other, so they do not have as much space to take up. In the example on the left, the data is plotted by the room data and all points are the same size. In the example on the right, the specifications were changed to using unit data and making the size of the points be based off the hub score (statistic related to graphs, search if wanting to know more). More arguments were also added, which are arguments that can be passed to plot.igraph.

Plotting this type of data could be helpful if you want to visualize how patients have moved throughout the hospital. Perhaps patients have recently been contracting a specific disease in the hospital, and you want to find if there are any common rooms or units among them. If that is the case, these plots can help to visualize that.

You can also summarize your data.

hn_summ <- summary(hn_object, by = "room")
hn_summ[5:6]

This provides many different statistics related to network graphs, which might be helpful if you are interested in knowing more about your network. You can also change it to work with your unit data, and you can also add more arguments that can be passed to hub_score and authority_score.

hn_summ2 <- summary(hn_object, by = "unit", scale = FALSE)
hn_summ2[5:6]

While network statistics might not be useful to everyone, they can provide numerical information about your network.

Overall, this half of the package provides useful functions for working with patient location history/patient network data.

Working With Disease-Exposure/Disease-Risk Factor Data

To begin analyzing disease-exposure/disease-risk factor data, it might be helpful to look at the example dataset that comes with the package and go through the below examples with it. The dataset can be loaded by doing the following:

de_data <- disease_expose_data
head(de_data)

To use most of the functions, your data must be made of binary columns (0s and 1s). As seen above, the data does not follow that right now, which is why a cleaner function is provided to help, which might also be helpful for you.

cleaned_de_data <- clean_disease_expose(data = de_data, disease = "disease", 
                                        noDisease = "No", 
                                        exposures = c("exposure1", "exposure2", "exposure3"))
head(cleaned_de_data)

As seen above, all columns are now binary variables, and we are ready to move on to the rest of the functions. Note that although all columns are binary, you might not want every column created by the cleaner function. For example, you might only want one of the two exposure2_... columns, as they are just opposites of each other. For now, we will just keep them all, but you can subset your data on your own in the future.

The next step is creating an object of class disease_expose, which will allow you to plot and summarize your data very easily. The best way to do this is by calling the helper function to create the object.

de_object <- disease_expose(cleaned_de_data)

This pulls up a Shiny gadget and allows you to select your disease column and any exposure/risk factor columns you want to include from your data. After providing your input, an object of class disease_expose will be created, barring any errors (most likely that there are columns that are not binary). If Shiny gadgets are not your thing and you would rather manually type in everything you want, you can do that with the constructor below.

de_object <- new_disease_expose(cleaned_de_data, disease = 1, exposures = 2:8)
class(de_object)

Now that you have an object of class disease_expose, you can move on to plotting and summarizing the data, simply by calling plot and summary and including the object. After the object is created, it is best to not edit it in any way; it should already be in a good format, so no changes are necessary.

Here is how to plot the data:

#subset used so viewing is better on document
plot(de_object[1:5])

As you can see, it produces bar charts for each disease-exposure combination in your data. If you want a little more customization, you can also add arguments that can be used in geom_bar. One that might look good is position = 'dodge'. Making these plots allows you to visualize how the diseased and non-diseased individuals are distributed across exposure status.

#subset used so viewing is better on document
plot(de_object[1:5], position = 'dodge')

In addition to plotting the data, you can also summarize the data like so:

summary(de_object)

As seen above, you now have many different statistics available for each disease-exposure combination in your data, like odds ratios, incidence in the exposed and unexposed, and confidence intervals for various statistics. Having these statistics available so quickly would be beneficial when seeing if there are associations between any exposures and the disease.

While the disease_expose functions are not overly sophisticated, they do provide some simple and quick tools to analyze disease-exposure data that you might have.

As seen above, all the functions have multiple use cases, as they can help to visualize and summarize patient location history/networks and also visualize and summarize disease-exposure/disease-risk factor data, which could both be useful in a hospital setting.

Future Work and Plans

Multiple ideas to improve the package in the future are already being drawn up. Those are:



npeters1322/hospEpi documentation built on April 30, 2022, 6:12 p.m.