WFDB: An Introduction to the Waveform Database Software Package
In EGM: Evaluating Cardiac Electrophysiology Signals

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(EGM)

The package relies and is augmented heavily by the Waveform Database (WFDB) software package, which is a well-documented and well-developed ecosystem of software applications, primarily written in C and C++ that manage the storage, reading, writing, and interaction with electrical signal data. This software is mostly external to the {EGM} package, and is used to supplement and expand the software.

DISCLAIMER: The software required to use annotations has not yet been re-written into a C backend for R, and thus requires using a local installation of WFDB for functionality. To build online vignettes and articles, where the software package is not available, I only demonstrate non-running examples.

Installation

To install the WFDB software in its traditional, C-based format, I have found it easiest with teh instructions from the Github source. The installation instructions are relatively clear across multiple operating systems.

WFDB is easiest to install on a Unix-based system, such as Linux or MacOS. For Windows, I have found that using WSL2 is the most consistent and supported way to utilize the software. When hidden on WSL, the path must be specified explicitly using the set_wfdb_path() function using the command from Powershell, for example, to switch to a WSL command environment.

set_wfdb_path("wsl /usr/local/bin")

Once this is in place, you should be able to work with WFDB files directly from R without significant overhead costs, while also gaining access to a variety of software applications.

Annotations

The WFDB software package utilizes a binary format to store annotations. Annotations are essentially markers or qualifiers of specific components of a signal, specifying both the specific time or position in the plot, and what channel the annotation refers to. Annotations are polymorphic, and multiple can be applied to a single signal dataset. The credit for this work goes directly to the original software creators, as this is just a wrapper to allow for flexible integration into R.

To begin, let's take an example of a simple ECG dataset. This data is included in the package, and can be accessed as below.

fp <- system.file('extdata', 'muse-sinus.xml', package = 'EGM')
ecg <- read_muse(fp)
fig <- ggm(ecg) + theme_egm_light()
fig

We may have customized software, manual approaches, or machine learning models that may label signal data. We can use the annotation_table() function to create WFDB-compatible annotations, updating both the egm object and the written file. To trial this, let's label the peaks of the QRS complex from this 12-lead ECG.

Create a quick, non-robust function for labeling QRS complex peaks. The function pracma::findpeaks() is quite good, but to avoid dependencies we are writing our own.
Evaluate the fit of the peaks to the dataset
Place the annotations into a table, updating the egm object
Plot the results

# Let x = 10-second signal dataset
# We will apply this across the dataset
# This is an oversimplified approach.
find_peaks <- function(x,
                       threshold = 
                         mean(x, na.rm = TRUE) + 2 * sd(x, na.rm = TRUE)
                       ) {

  # Ensure signal is "positive" for peak finding algorithm
  x <- abs(x)

  # Find the peaks
  peaks <- which(diff(sign(diff(x))) == -2) + 1

  # Filter the peaks
  peaks <- peaks[x[peaks] > threshold]

  # Return
  peaks
}

# Create a signal dataset
dat <- extract_signal(ecg)

# Find the peaks
sig <- dat[["I"]]
pk_loc <- find_peaks(sig)
pk_val <- sig[pk_loc]
pks <- data.frame(x = pk_loc, y = pk_val)

# Plot them
plot(sig, type = "l")
points(x = pks$x, y = pks$y, col = "orange")

The result is not bad for a simple peak finder, but it lets us generate a small dataset of annotations that can then be used. Please see the additional vignettes for advanced annotation options, such as multichannel plots, and multichannel annotations. We can take a look under the hood at the annotation positions we generated. The relevant arguments, which are displayed below, include:

annotator: name of annotation function or creator
time: constructed from sample number and frequency
sample: integer index of positions
type: a single character describing the type
subtype: a single character describing the type
channel: which channel the data is mapped to
number: an additional qualifier of the annotation type

# Find the peaks
raw_signal <- dat[["I"]]
peak_positions <- find_peaks(raw_signal)
peak_positions

# Annotations do not need to store the value at that time point however
# The annotation table function has the following arguments
args(annotation_table)

# We can fill this in as below using additional data from the original ECG
hea <- ecg$header
start <- attributes(hea)$record_line$start_time
hz <- attributes(hea)$record_line$frequency

ann <- annotation_table(
  annotator = "our_pks",
  sample = peak_positions,
  type = "R",
  frequency = hz,
  channel = "I"
)

# Here are our annotations
ann

# Then, add this back to the original signal
ecg$annotation <- ann
ecg