library('knitr')

opts_chunk$set(
  cache = FALSE, fig.align = 'center', dev = 'png',
  fig.width=9, fig.height=7, echo = TRUE, message = FALSE
)

Data sets available

library('kmdata')

data(package = 'kmdata')

This will show a list of the data sets available. Data objects can be references by ATTENTION_2A or `TRIBE(2)_2A` (with backticks) for data names with special characters.

The name consists of the study short-name and figure identifier:

data <- ls('package:kmdata', pattern = '^[A-Z]')
cbind(
  name = data,
  study = gsub('_.*', '', data),
  figure = gsub('.*_', '', data)
)[1:5, ]

For example, study "ATTENTION" has two figures: "2A" and "2B." Study "ACTSCC" has only one figure: "2A." If data sets sourced from multi-panel figures, the name will look similar to study "ATTENTION" with figure IDs "2A" and "2B."

All data sets are listed in kmdata_key along with some useful metadata for each including the journal and publication identifiers, outcomes and study arms, quality of the re-capitulated data, and other information.

Working with the data

Each data set contains the same format for consistency:

knitr::kable(
  data.frame(
    time = 'time-to-event (in units)', event = 'event indicator (0/1)',
    arm = 'treatment arm identifier (e.g., arm-1 vs arm-2)'
  )
)

The time unit, event type, and treatment arms can be found in the help page for each data set, e.g., ?ACT1_2A. Additionally, the data objects contain metadata stored as attributes:

head(ACT1_2A)

attr(ACT1_2A, 'event')

attributes(ACT1_2A)[-(1:3)]

Data may be examined and plotted using the built-in functions summary and kmplot.

summary(ACT1_2A)

kmplot(ACT1_2A)

Selecting data

The kmdata package contains a function, select_kmdata, to easily search and filter data sets which share common features. Any of the columns in kmdata_key may be used to filter.

For example, if we wanted a list of lung cancer data sets with overall survival (OS) in months with fewer than 500 patients reporting at least a 1.2 hazard ratio for treatment compared to a reference arm, we can use the following:

select_kmdata(
  Cancer %in% 'Lung' &
    Outcome %in% 'OS' &
    Units %in% 'months' &
    ReportedSampleSize < 500 &
    HazardRatio >= 1.2,
  return = 'name'
)

By default, select_kmdata returns only the names of the data sets for reference individually (i.e., select_kmdata(..., return = 'name')), but it can also return the matching rows of kmdata_key or the matching data sets as a list.

key <- select_kmdata(
  Cancer %in% 'Lung' &
    Outcome %in% 'OS' &
    Units %in% 'months' &
    ReportedSampleSize < 500 &
    HazardRatio >= 1.2,
  return = 'key'
)

dat <- select_kmdata(
  Cancer %in% 'Lung' &
    Outcome %in% 'OS' &
    Units %in% 'months' &
    ReportedSampleSize < 500 &
    HazardRatio >= 1.2,
  return = 'data'
)

par(mfrow = n2mfrow(length(dat)))
for (dd in dat)
  kmplot(dd)

Data quality

Each figure and data set contains a quality score which represents how well the re-capitulated agrees with the original publication. Scores range from 0 (worst) to 100% (best) and are an aggregation of four metrics: hazard ratio, total events, median time-to-event, and number at-risk.

Each metric is score from 0 (worst) to 3 (best); the maximum score per figure may vary with the metrics reported in the original publication. For example, if only one was reported, the maximum score is 3/3.

A score of 3 points is given per metric per figure if the re-capitulated metric is no more than 5% different than the published, 2 points are given if the metric is 5-10% different, 1 point for 10-20%, and 0 points for more than 20% different.

| % difference from publication | Quality points per metric | |-------------------------------:|---------------------------:| | 0-5 | 3 | | 5-10 | 2 | | 10-20 | 1 | | > 20 | 0 |


References

The publications and figures available in this package are listed below by first author.

Click to expand

cit <- system.file('docs', 'Citations_final.xlsx', package = 'kmdata')
cit <- as.data.frame(readxl::read_excel(cit, skip = 1L))

cit <- within(cit, {
  Title    <- gsub('^.*?\\.\\s+|\\.\\s+[A-z ]+\\d{4};.*$', '', Reference)
  Author   <- gsub('^([^.]+\\.)|.', '\\1', Reference)
  PubData  <- gsub('([A-z ]+\\s+\\d{4};.*)\\.$|.', '\\1', Reference)
  Journal  <- gsub('(.*?)\\d{4};|.', '\\1', PubData)
  Year     <- gsub('(\\d{4});|.', '\\1', PubData)
  Location <- gsub('^.*?\\d{4};\\s*', '\\1', PubData)
})[, c('PMID', 'Author', 'Journal', 'Year', 'Title', 'Location')]
cit <- cit[order(cit$Author), ]
rownames(cit) <- NULL

knitr::kable(cit, format = 'markdown', caption = 'List of publications.')



raredd/kmdata documentation built on June 15, 2025, 9:33 a.m.