In philliplab/tsic: Time since infection calculator

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(ggplot2)
library(tsic)

This is a guide for simulating diagnostic test histories. It uses the window period distributions built into tsic to simulate histories that have the exact assumptions that tsic uses.

Producing an ordered list of diagnostic tests.
Simulating diagnostic test results if you know the time since infection.
Simulating the time until first positive
Simulating an ihist

Producing an ordered list of diagnostic tests.

It is extremely important that the simulation function be provided with a list of assays that is ordered by the duration of the window period of the assays from shortest to longest. If this list is not ordered correctly, the simulation WILL produce incorrect results.

The sim_dx_results function includes the option to check the list order. It is controlled via the skip_order_check argument. By default it is skipped since it is very slow. It is HIGHLY recommended that you set this argument to FALSE and check your assay order once before you start your simulations.

list_of_assays <- c("iscav2_weib3_delaney_and_tosiano", "taqman_weib3_delaney_and_manufacturer", 
                    "architect_weib3_delaney", "geenius_indet_weib3_delaney", 
                    "geenius_fr_weib3_delaney")

result <- try(
  sim_dx_results(10, c(list_of_assays[5], list_of_assays[1:4]), skip_order_check = FALSE)
)

sim_dx_results(10, list_of_assays, skip_order_check = FALSE)

Getting diagnostic test results if you know the time since infection.

The sim_dx_results will produce a set of diagnostic test results given a time since infection. Note that the function must be provided with a character vector specifying the assays to consider and that this vector must be ordered from the fastest assay to the slowest assay.

tsi <- 25 # time since infection

print(sim_dx_results(tsi, list_of_assays)) # call 1
print(sim_dx_results(tsi, list_of_assays)) # call 2

Note that for this function, successive calls are independent of each other. So it is a poor tool if you want to simulate a diagnostic test trajectory for any single participant. For example the RNA assay can be positive 4 days after infection on the first run of the function and then be negative 5 days after infection on the second run of the function. If the visits are spaced far enough apart, then such concerns are much lower, but if you want to build up a consistent history for a participant, then this is not the right tool for the job. This is demonstrated in the figure below.

all_results <- data.frame(tsi = 0,
                          test = list_of_assays,
                          result = '-',
                          stringsAsFactors = FALSE)

for (tsi in 1:60){
  cresults <- data.frame(tsi = tsi,
                         test = list_of_assays,
                         result = '-',
                         stringsAsFactors = FALSE)

  dx_results <- sim_dx_results(tsi = tsi, list_of_assays = list_of_assays)
  for (ctest in names(dx_results)){
    cresults[cresults$test == ctest, 'result'] <- dx_results[[ctest]]
  }
  all_results <- rbind(all_results, cresults)
}

all_results$test <- ordered(all_results$test, levels = list_of_assays)

print(
ggplot(all_results, aes(x = tsi, y = test)) +
  geom_point(aes(col = result), size = 2)
)

Figure: Repeated calls to sim_dx_results on each of the first 60 days since infection. Note the incongruence between some subsequent days driven by the independence of the subsequent calls.

Simulating the time until first positive

To get around the issues associated with using the sim_dx_results function for constructing an ihist, the sim_sc_times function is recommended. It assumes that the true infection time is at time zero and then simulates the time until seroconversion for a list of assays. The seroconversion times are simulated subject to the constraint that the time until seroconversion of a person on an assay with a longer window period must be longer than the time until seroconversion on an assay with a shorter window period.

sc_times <- sim_sc_times(list_of_assays)
print(sc_times)

Simulating an `ihist`

To simulate an ihist the seroconversion times of sim_sc_times need to be combined with a set of visit times. The function combine_sc_and_visit_times performs this function. It has the following parameters:

sc_times: A list with names giving the assay names and elements the seroconversion time of that assay relative to the true infection time as produced by sim_sc_times.
visit_times: A vector of visit times relative to the time of true infection, with true infection at time = 0. Thus -5 indicates a visit 5 days before true infection.
true_infection_date: The true of the true infection expressed as the number of days since 1970-01-01. This will be used to shift all the dates in the resulting ihist.
ptid: The value to assign to the ptid column of the resulting ihist.

An example where there is a visit every day for comparison to the figure for sim_dx_results. Note how once a test returns a positive result, it continues to return positive results and contrast that with sim_dx_results.

ihist <- combine_sc_and_visit_times(sc_times = sc_times,
                                    visit_times = (0:60),
                                    true_infection_date = as.numeric(as.Date('2018-05-05')),
                                    ptid = 'p001')
head(ihist)
ihist$test <- ordered(ihist$test, levels = list_of_assays)
print(
ggplot(ihist, aes(x = sample_date, y = test)) +
  geom_point(aes(col = result), size = 2)
)

A realistic example with a visit every 28 days with the last negative 10 days before the true infection time (by passing in visit_times = (0:4)*28 - 10).

ihist <- combine_sc_and_visit_times(sc_times = sc_times,
                                    visit_times = (0:4)*28 - 10,
                                    true_infection_date = as.numeric(as.Date('2018-05-05')),
                                    ptid = 'p001')
ihist$test <- ordered(ihist$test, levels = list_of_assays)
print(
ggplot(ihist, aes(x = sample_date, y = test)) +
  geom_point(aes(col = result), size = 2)
)