knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) load(file="../data/maude.rda") load(file="../data/sales.rda") library(mds)
Why?
Medical device event data are messy.
Common challenges include:
How?
The mds
package provides a standardized framework to address these challenges:
R
files for auditability, documentation, and reproducibilityPurpose of This Vignette
mds
mds
functions: deviceevent(), exposure(), define_analyses(), time_series()Note on Statistical Algorithms
mds
data and analysis standards allow for seamless application of various statistical trending algorithms via the mdsstat
package (under development).
Our example dataset maude
was queried from the FDA MAUDE API and contains 535 reported events on bone cement in 2017. Furthermore, a simulated exposure dataset sales
was generated to provide denominator data for our bone cement events.
library(mds) dim(maude) dim(sales)
head(maude, 3)
knitr::kable(head(maude, 3))
head(sales, 3)
knitr::kable(head(sales, 3))
The general workflow to go from data to trending over time is as follows:
deviceevent()
to standardize device-event data.exposure()
to standardize exposure data (optional).define_analyses()
to enumerate possible analysis combinations.time_series()
to generate counts (and/or rates) by time based on your defined analyses.# Step 1 - Device Events de <- deviceevent( maude, time="date_received", device_hierarchy=c("device_name", "device_class"), event_hierarchy=c("event_type", "medical_specialty_description"), key="report_number", covariates="region", descriptors="_all_") # Step 2 - Exposures (Optional step) ex <- exposure( sales, time="sales_month", device_hierarchy="device_name", match_levels="region", count="sales_volume") # Step 3 - Define Analyses da <- define_analyses( de, device_level="device_name", exposure=ex, covariates="region") # Step 4 - Time Series ts <- time_series( da, deviceevents=de, exposure=ex)
You may:
de
, ex
), analyses (da
), and time series (ts
) for documentationsummary()
and define_analyses_dataframe()
plot()
your time series (plotting options)mdsstat
package)summary(da)
dadf <- define_analyses_dataframe(da)
head(dadf, 3)
knitr::kable(head(dadf, 3))
plot(ts[[1]]) plot(ts[[4]], "rate", type='l')
deviceevent()
to Standardize Device-Event Data {#de}Basic Usage
de <- deviceevent(maude, "date_received", c("device_name", "device_class"), c("event_type", "medical_specialty_description"))
head(de, 3)
knitr::kable(head(de, 3))
Advanced Usage
de <- deviceevent( maude, time="date_received", device_hierarchy=c("device_name", "device_class"), event_hierarchy=c("event_type", "medical_specialty_description"), key="report_number", covariates="region", descriptors="_all_")
head(de, 3)
knitr::kable(head(de, 3))
data_frame
: is the input device-event data frame. All remaining arguments refer to variables within this data frame.
time
: is the name of the variable containing the time of the event. The input format of the time will be flexibly converted into Date
format.
device_hierarchy
: organizes all your device variables into a hierarchy. The hierarchical concept reflects how devices are often nested into progressively more general groups. Set the first variable as the lowest device level that you would like to trend at. mds
remembers this hierarchy and allows trending at multiple levels as you specify.
event_hierarchy
: organizes all your event variables into a hierarchy. Like devices, event variables should be categorical in nature. Free text descriptions should not be listed here, but rather in the descriptors
argument. The hierarchical concept reflects how events are often nested into progressively more general groups. Set the first variable as the lowest event level that you would like to trend at. mds
remembers this hierarchy and allows trending at multiple levels as you specify. If your data does not have an event variable, you will need to create a dummy variable.
key
: is a unique identifier for each unique event in data_frame
. If your data pipeline carries over a key variable, it is recommended to specify it here. The key
allows downstream aggregated analysis to be able to "look up" individual constituent events.
covariates
: are a special group of variables that may be analyzed within device. For instance, declaring covariates="Region"
will allow analysis of regions within device. These variables should be categorical in nature.
descriptors
: are additional variables that should be retained for the purpose of describing individual events in downstream analysis.
implant_days
: contains the age in days of an implantable device at the time of the event.
exposure()
to Standardize Exposure Data {#ex}Exposure data is meant to support device-event data. As such, the general expectation is that variable values match between exposure and device-event data. For example, 10 exposures for ev3 Solitaire
in France
will be matched exactly to ev3 Solitaire
events in France
, and not to events for EV3 SOLITAIRE
in FRANCE
.
Basic Usage
ex <- exposure(sales, "sales_month", "device_name")
head(ex, 3)
knitr::kable(head(ex, 3))
Advanced Usage
ex <- exposure( sales, time="sales_month", device_hierarchy="device_name", match_levels="region", count="sales_volume")
head(ex, 3)
knitr::kable(head(ex, 3))
Note: Although not required, count
will commonly be used as well.
data_frame
: is the input exposure data frame. All remaining arguments refer to variables within this data frame.
time
: is the name of the variable containing the time of the exposure. The input format of the time will be flexibly converted into Date
format. If exposure will be used, it is critical to have sufficient time granularity. For example, if analysis will be done monthly, exposure data must be no less granular than monthly. mds
does not make assumptions about filling in holes in time!
device_hierarchy
: contains all exposure device variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data device_hierarchy
parameter.
event_hierarchy
: contains all exposure event variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data event_hierarchy
parameter. Exposures at an event level is not common.
count
: is the most commonly specified optional parameter. It contains the number of exposures. If not specified, the number of rows will be used as a proxy for count.
key
: is a unique identifier for each unique exposure in data_frame
. If your data pipeline carries over a key variable, it is recommended to specify it here. The key
allows downstream aggregated analysis to be able to "look up" individual constituent exposure records.
match_levels
: are variables aside from time, device, and event that specify an exposure. A common "match level" is country, if your exposure data is specific by country.
define_analyses()
to Enumerate Analysis Combinations {#da}After standardizing device-event data using deviceevent()
and, optionally, exposure data using exposure()
, the next step is to discover what types of analyses are possible. This is separated from actually doing the analysis (counting, calculations, statistics, etc.) because:
Basic Usage
da <- define_analyses(de, "device_name")
Note that define_analyses()
returns a list of individual analyses. Each individual analysis contains a set of instructions. You can view an analysis by submitting da[[1]]
, da[[2]]
, etc., but a less cumbersome overview is possible using summary()
and define_analyses_dataframe()
.
r options(warn=-1)
summary(da)
r options(warn=0)
head(define_analyses_dataframe(da), 3)
knitr::kable(head(define_analyses_dataframe(da), 3))
Advanced Usage
da <- define_analyses( de, device_level="device_name", exposure=ex, covariates="region")
summary(da)
head(define_analyses_dataframe(da), 3)
knitr::kable(head(define_analyses_dataframe(da), 3))
deviceevents
: is a standardized device-event data frame. (class()
should contain "mds_de"
)
device_level
: is the source device variable to analyze by. Access these variable names by submitting attributes(de)$device_hierarchy
.
event_level
: is the source event variable to analyze by. Access these variable names by submitting attributes(de)$event_hierarchy
.
exposure
: is a standardized exposure data frame. (class()
should contain "mde_e"
)
date_level
and date_level_n
: are arguments used together to specify the time interval to analyze by. Default of "months"
and 1
analyzes by month. Other examples include "months"
and 12
for yearly, or "days"
and 7
for weekly.
covariates
: are variables to analyze within device. For example, c("region")
analyzes by each level of region
within device.
times_to_calc
: specifies how many time periods (counting back in time) to analyze for. Time period is defined using date_level
and date_level_n
.
It is always assumed that analyses at aggregated levels are desired. (such as analysis of all events for a given device, or analysis of all events across all devices)
Aggregated level analysis is easily recognized by the "All"
and "Data"
values in device_level
, event_level
, covariate
, and covariate_level
.
knitr::kable(define_analyses_dataframe(da)[c(11:12, 32), ])
There are several options:
da[[c(1:5, 24:27)]])
)define_analyses()
with different parameter settingsda[[1]]$date_range_exposure['start'] <- as.Date("2016-10-01")
)time_series()
to Generate Counts, Rates, and More {#ts}Once an analysis has been defined using define_analyses()
, the analyses instructions can be executed using time_series()
, returning by defined time periods:
key
parameter from deviceevent()
) for lookup of individual event records.key
parameter from exposure()
) for lookup of individual exposure records.Basic Usage
ts <- time_series(da, de)
Note that time_series()
returns, in a list, one time series data frame for every analysis. You can select a time series by submitting ts[[1]]
, ts[[2]]
, etc.
head(ts[[1]], 3)
knitr::kable(head(ts[[1]], 3))
Advanced Usage
ts <- time_series( da, deviceevents=de, exposure=ex)
head(ts[[1]], 3)
knitr::kable(head(ts[[1]], 3))
analysis
: is a single defined analysis (class()
should contain "mds_da"
) or a list of defined analysis.
deviceevents
: is a standardized device-event data frame (class()
contains "mds_de"
). It is typically the same data frame used to generate analysis
, but can be another "mds_de"
data frame, such as a cut of the data at a different time. Note if, say, an older dataset is being used, the analysis
date ranges must correspond.
exposure
: is a standardized exposure data frame (class()
contains "mds_e"
). It is typically the same data frame used to generate analysis
. Like deviceevents
, another data frame may be used, but the analysis
instructions must correspond.
use_hierarchy
: is a logical value for whether device and event hierarchies should be used for the calculation of disproportionality analysis (DPA) counts. Submit ?time_series.mds_da
for more details.
It is not uncommon to adjust event and exposure counts, such as with applications of rolling or moving averages. These adjustments should be applied after generating time series data frames from time_series()
.
plot()
ing a Time Series {#po}Plotting an individual time series generated by time_series()
is simple. Simply call plot()
on the time series object:
plot(ts[[1]])
There are a few custom parameters, including:
mode
: with common values of "nA"
(representing the device-event of interest), "exposure"
, and "rate"
(simply "nA"/"exposure"
). Less common are "nB"
, "nC"
, and "nD"
representing the cell counts of the disproportionality analysis (DPA) contingency table.
xlab
, ylab
, main
: representing default plot()
behavior. By default, axes and title labels are inferred directly from the time series.
All other parameters are from plot.default()
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.