In OHDSI/SelfControlledCaseSeries: Self-Controlled Case Series

library(dplyr)
library(knitr)

Introduction

This document describes the data model of the output of the SelfControlledCaseSeries (SCCS) package, generated by the exportToCsv() function. This vignette assumes you are already familiar with the SelfControlledCaseSeries package, and have read all other vignettes.

Exposures, covariates of interest, and controls

As described in the 'Single studies using the SelfControlledCaseSeries package' vignette, eras are cohorts or drug eras extracted from the database. Covariates can either be splines, for example representing age or season, or era covariates, derived from eras. When defining covariates using the createEraCovariateSettings() function we can either use verbatim era IDs (e.g. cohort IDs), or we can reference a variable (typically called 'exposureId'). When defining exposures using the exposure() function, we can define different era IDs to be used for this variable, thereby using the same analysis settings for different exposures and outcomes. For each exposure we can set the trueEffectSize if known. Any exposure with known true effect size is considered a control, and will be used for empirical calibration. Some of our covariates can be marked as covariates of interest by setting exposureOfInterest = TRUE when calling createEraCovariateSettings(). This is especially relevant for the results model, since these covariates will be reported in the sccs_result table.

Exposures-outcome-sets, analysis IDs and models

Using the createExposuresOutcome() function we can define an outcome with one or more exposures, since an SCCS model can have multiple exposures (e.g. we could have separte exposures for the first and second dose of a vaccine). With the createSccsAnalysis() function we can create a set of settings for analysis describing which data to extract from the database, how to transform that data including which covariates to construct, and how to fit the SCCS model. Each analysis setting has a unique analysis ID. Each combination of an exposures-outcome-set and an analysis setting will correspond to one SCCS model. A model can have multiple covariates, and each covariates can be based on multiple eras.

Fields with minimum values

Some fields contain patient counts or fractions that can easily be converted to patient counts. To prevent identifiability, these fields are subject to a minimum value. When the value falls below this minimum, it is replaced with the negative value of the minimum. For example, if the minimum subject count is 5, and the actual count is 2, the value stored in the data model will be -5, which could be represented as '\<5' to the user. Note that the value 0 is permissible, as it identifies no persons. These fields are identified below as having Min. count = 'Yes'.

Tables

In this section you will find the list of tables and their fields.

specifications <- readr::read_csv(system.file(
  "csv",
  "resultsDataModelSpecification.csv",
  package = "SelfControlledCaseSeries"
)) %>%
  SqlRender::snakeCaseToCamelCaseNames()
tables <- split(specifications, specifications$tableName)

table <- tables[[1]]
for (table in tables) {
  header <- sprintf("## Table %s", table$tableName[1])

  table <- table %>%
    select(Field = "columnName", Type = "dataType", Key = "primaryKey", "Min. count" = "minCellCount", Description = "description") %>%
    kable(format = "simple", linesep = "", booktabs = TRUE, longtable = TRUE)

  writeLines("")
  writeLines(header)
  writeLines(table)
}