txvis - Data Visualization for Treament Patterns

The txvis package for R is intended to provide users with the ability to generate high quality, customizable figures for both inital exploratory data analysis, and for presentation. The package uses a txvis data object that is composed of both treatment sequence data and treatment related events. The package provides print and summary methods for the txvis object, and includes a number of options for plotting.

Basic use

Installing the Package

Many packages are now available from GitHub. Currently txvis is not supported through the Comprehensive R Network (CRAN), and so the installation procedure is somewhat different than for other common packages. To install txvis you must use the devtools package:

library(devtools)

devtools::install_github('johnwilsonICON/txvis')

Loading Data

The package itself comes with a data object, treat that provides the opportunity to test the package. treat is intended to be a sample of treatment data a client might provide, and does not reflect the required data format within the package. Data is formatted by the function create_txvis.

Most treatment data includes:

This is the minimum information supported by the package. The data model can be extended to include one or more treatment classes, hierarcies for the treatment types. These are user defined, so, for example, the hierarchy may be: warfarin -> anticoagulant

The plotting functions in txvis expect a txvis object. The txvis object is a list with two elements, a treatment and event object. Each of these are optional, but most plotting cannot occur without treatment data.

library(txvis)

head(treat)

In the supplied data, a set of patients with unique encoding have up to 8 unique treatments. The treat dates are text string, but represent numeric data/time objects. This data is meant to represent data that might be obtained from a clinical trial, which could require cleaning prior to analysis.

head(events)

For individual patients a number of events have also been encoded. These events are associated with either point events, or events that occur over multiple days. The txvis package makes use of a number of other visualization packages, but is intended to wrap the pre-processing. To do this requires the use of a special txvis object. We simplify the process of generating the object using the create_txvis function:

# Using the existing data.  Event data is optional, but treatment data is required.
# Here the date encoding for the startand end date is slightly peculiar:
head(treat$start)

# So we use the `create_txvis` flag `date_format` to ensure that the formatting is correct:

hlth_data <- create_txvis(patient        = treat$pat_id, 
                          treatment      = treat$treatment,
                          start          = treat$start,
                          end            = treat$end,
                          date_format    = "%d%B%Y",
                          ev_patient     = events$pat_id,
                          events         = events$event,
                          event_date     = events$start,
                          event_end_date = events$end)

The package was designed to accept multiple date formats since researchers encode & manage dates in multiple formats. The encoding follows the formatting presented in strptime and as such can easily accomodate many standard and non-standard formats.

With the newly created txvis object it becomes possible to plot one of the many options in the package. The package also provides print and summary methods to easily provide an overview of the data structure and contents:

print(hlth_data)

shows the head and tail of the treatment data, and provides a brief summary.

The summary method:

summary(hlth_data)

provides more extensive insight into the object data and individual treatment sequences. Ultimately the txvis object is a list, assigned the class c('txvis', 'list'). Each of the events and treat objects are simply data.frames, and as such can be manipulated with any of R's tools for manipulating data.

Creating an Alluvial plot:

The alluvial plot wraps the alluvial function from the alluvial package, which is available from GitHub. The first time the function is run it will prompt you to download the package if you haven't already.

# Given that data has already been loaded into `hlth_data`

tx_alluvial(hlth_data)

Plotting sequence data by patient

It may be the case that we are interested in seeing how patient data squences change over time. Using the ggplot2 package, we can show treatment data over time using tx_indiv

tx_indiv(hlth_data, events = FALSE)

Here we see that the data is plotted out, with ten patients sampled randomly from the set of patients. We can sample more patients and add the set of events to the mapping:

tx_indiv(hlth_data, nsample = 50, events = TRUE)

Or we can align the plots so that they all appear to start at time 0, and provide some extra customization. Note that to provide extra customization we need to load the ggplot2 package.

library(ggplot2)

tx_indiv(hlth_data, 
         nsample = 50, 
         events = TRUE, 
         align = TRUE) +
  scale_x_continuous(expand = c(0,0)) +
  xlab("Days Since Treatment Start") +
  ylab("Patient") +
  theme_bw() +
  theme(axis.text.y = element_blank())

With each method it is possible to visualize treatment outcomes, and emphasise particular aspects of the treatment sequencing. For example, understanding transitions from one treatment to another can be understood using tx_transmat, a visualization for transition matrices. The method examines treatment sequences rather than dates or date ranges. The sequences of interest can be passed using the sequences parameter. By default the first two sequences are examined, although any two sequences may be chosen.

tx_transmat(hlth_data, sequences = c(1,2))

Here we can easily see that the most frequent treatment shift is from Treatment Tx1 to Tx3. Treatments Tx4 - 8 are infrequently administered in the early phase of the treatment sequence. We could arrange multiple sequence comparisons together:

# Show transition matrices for the first four sequences.
par(mfrow = c(2,2))
tx_transmat(hlth_data, sequences = c(1,2))
tx_transmat(hlth_data, sequences = c(2,3))
tx_transmat(hlth_data, sequences = c(3,4))

We can see now that the main sequence of treatments do vary, but that in general Tx1-3 dominate the treatment regime.

Recent work porting the d3.js JavaScript package to R has allowed us to leverage verious plotting devices that are HTML ready and interctive. This includes sunburst plots:

tx_sunburst(hlth_data)

and d3 style alluvial plots:

tx_d3alluvial(hlth_data)


johnwilsonICON/TxVis documentation built on May 19, 2019, 5:18 p.m.