knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

In this document, we show how following Caduff et al. 2021 (https://doi.org/10.1101/2021.08.22.21262024) we can use this package to compute logistic growth rate on binomial time series data, as well as in the presence of background prevalence.

library(WWdPCR)

Example 1: simulated data

Generate simulated data

Simulate data from a three-parametric (3PL) logistic model, with lower asymptote (background prevalence) at 0.1, and binomial distribution of the data. There are 31 timepoints, each with 20 samples. The form of the 3PL is:

$$ r_{m u t}(t)=c+(1-c) \frac{e^{a\left(t-t_{0}\right)}}{1+e^{a\left(t-t_{0}\right)}} $$

set.seed(43)

N <- 30
xx1 <- 0:N
yy1 <- pl3model(xx1, 0.5, 12, 0.1)
yy2 <- rbinom(N+1, 20, yy1) / 20

Fit model and inspect results

Fit a 2PL model (i.e. "usual" logistic growth) with binomial noise and inspect the results. We see that the growth rate is vastly underestimated.

fit_2PL <- fit_pl(yy2,  # response vector of fractions
                  xx1, # timepoints vector
                  20, # total counts, here is constant but can be a vector if the counts are different at each timepoint
                 fitfun = pl2model, # we want to fit using the 2PL funtion
                 likelihood = loglik_binom_n, # we are fitting binomial data
                 quasi = T, # use a quasilikelihood model to account for possible overdispersion
                 starting_values = c(a = 0.4, b = 10), # starting values for optimization of parameters
                 lower = c(-Inf, -Inf), upper = c(Inf, Inf)) # boundaries for the parameters, just need to make sense
coef(fit_2PL)
confint(fit_2PL)

Fit a 3PL model with binomial noise and inspect the results. We see that the growth rate estimate is on point.

fit_3PL <- fit_pl(yy2, # response vector of fractions
                  xx1, # timepoints vector
                  20, # total counts, here is constant but can be a vector if the counts are different at each timepoint
                 fitfun = pl3model, # we want to fit using the 3PL funtion
                 likelihood = loglik_binom_n, # we are fitting binomial data
                 quasi = T, # use a quasilikelihood model to account for possible overdispersion
                 starting_values = c(a = 0.4, b = 10, c = 0.1),  # we need a good guess for the starting values
                 lower = c(-Inf, -Inf, 0), upper = c(Inf, Inf, 1))  # boundaries for the parameters, just need to make sense
coef(fit_3PL)
confint(fit_3PL)

Plot result:

We can use the $\verb?fitted?$ method for easy plotting. We see that, as expected, the fit using a "usual" logistic model looks very wrong. The 3PL fit looks alright though.

plot(xx1, yy2)
lines(xx1, yy1)
lines(xx1, fitted(fit_3PL), col="red")
lines(xx1, fitted(fit_2PL), col="blue")
legend("bottomright", legend=c("true", "3PL fit", "2PL fit"),
       col=c("black", "red", "blue"),
       lty=c(1,1,1), cex=1)

Plot confidence bands for fit

We can predict, and compute standard errors either in the original linear scale, or in the logit scale. Computing standard errors in the logit scale allows for computing confidence intervals that, when back transformed to the linear scale, are guaranteed to be restricted to the [0,1] range (and generally should have better coverage). Here the different is not flagrant.

predicted <- predict(fit_3PL, scale="linear", se=TRUE) #returns a list with "pred" and "se" entries
plot(xx1, yy2)
lines(xx1, predicted$pred, col="black")

#plot confints calculated from linear space se
lines(xx1, predicted$pred - 1.96*predicted$se, lty=2, col="red")
lines(xx1, predicted$pred + 1.96*predicted$se, lty=2, col="red")

#plot confints calculated from logit space se
predicted_logit <-  predict(fit_3PL, scale="logit", se=T)
lines(xx1, logit_inv(predicted_logit$pred - 1.96*predicted_logit$se), lty=2, col="blue")
lines(xx1, logit_inv(predicted_logit$pred + 1.96*predicted_logit$se), lty=2, col="blue")
legend("bottomright", legend=c("3PL fit", "linear confint", "logit confint"),
       col=c("black", "red", "blue"),
       lty=c(1,2,2), cex=1)

Test 3PL vs 2PL

We can compute a quasi log-likelihood ratio test, to test whether the lower asymptote (background prevalence) is significantly different from zero:

# compute quasi log-likelihood ratio, with inflation factor from 2PL (more conservative)
llr <- -2*((-1*fit_3PL$likelihood/fit_2PL$infl_factor) - (-1*fit_2PL$likelihood/fit_2PL$infl_factor))
# chisquare test of the log-likelihood ratio
print(c("2logLr:"=llr, "p-val:"=formatC(1-pchisq(llr, 1), format = "e", digits = 2)) )

Example 2: fitting on Swiss HV69-70del clinical data and on Swiss B.1.1.7 clinical data

Load and prepare data

We have daily prevalence in the Swiss clinical sequencing of the HV69-70 deletion, ORF1a deletion associated with B.1.1.7, as well as the fraction of B.1.1.7. They were downloaded from Cov-Spectrum:

str(clinical_data)
# let's aggregate them by date for all "divisions" (Cantons)
clinical_data_tot <- aggregate(. ~ date, data=clinical_data, sum)
clinical_data_tot <- clinical_data_tot[(as.Date(clinical_data_tot$date) >= as.Date("2020-12-07")) & 
                                           (as.Date(clinical_data_tot$date) <= as.Date("2021-03-26")), ]
clinical_data_tot$days_numeric <- as.numeric(as.Date(clinical_data_tot$date)-as.Date("2020-12-07"))

Fit on the HV69-70 deletion prevalence and inspect

Fit a 2PL model to the HV69-70 deletion prevalence.

clinical_del6970_2PL <- fit_pl(clinical_data_tot$HV6970del / clinical_data_tot$Tot, 
                               clinical_data_tot$days_numeric, 
                               clinical_data_tot$Tot,
                               fitfun = pl2model,
                               quasi=T, 
                               starting_values = c("a"=0.1, "b"=50), 
                               lower=c(0, -Inf, 0), upper=c(1, Inf, 1))
confint(clinical_del6970_2PL)

Fit a 3PL model:

clinical_del6970_3PL <- fit_pl(clinical_data_tot$HV6970del / clinical_data_tot$Tot, 
                               clinical_data_tot$days_numeric, 
                               clinical_data_tot$Tot,
                               likelihood = loglik_binom_n,
                               fitfun = pl3model,
                               quasi=T, 
                               starting_values = c("a"=0.05, "b"=50, "c"=0.01),
                               lower=c(0, -Inf, 0), upper=c(1, Inf, 1))


confint(clinical_del6970_3PL)

Fit on the B.1.1.7 prevalence and inspect

Fit a 2PL model to the B.1.1.7 prevalence.

clinical_B117_2PL <- fit_pl(clinical_data_tot$B117 / clinical_data_tot$Tot, 
                               clinical_data_tot$days_numeric, 
                               clinical_data_tot$Tot,
                               fitfun = pl2model,
                               quasi=T, 
                               starting_values = c("a"=0.1, "b"=50), 
                               lower=c(0, -Inf, 0), upper=c(1, Inf, 1))
confint(clinical_B117_2PL)

Fit a 3PL model:

clinical_B117_3PL <- fit_pl(clinical_data_tot$B117 / clinical_data_tot$Tot, 
                               clinical_data_tot$days_numeric, 
                               clinical_data_tot$Tot,
                               likelihood = loglik_binom_n,
                               fitfun = pl3model,
                               quasi=T, 
                               starting_values = c("a"=0.05, "b"=50, "c"=0.01),
                               lower=c(0, -Inf, 0), upper=c(1, Inf, 1))


confint(clinical_B117_3PL)

Plot

We see on the plots that the fit are quite different if using a 3PL or a 2PL for the HV69-70 data, but that they are not different when fitting on the B.1.1.7 fractions.

par(mfrow=c(1,2))

# plot raw data and fits for HV69-70 deletion
plot(clinical_data_tot$days_numeric,
     clinical_data_tot$HV6970del /  clinical_data_tot$Tot,
     xlab="days since 2020-12-07",
     ylab="frac. HV69-70del in clinical samples", 
     main="fit on Swiss HV69-70 data")
lines(clinical_data_tot$days_numeric,
      clinical_del6970_3PL$fitted,
      col="red")
lines(clinical_data_tot$days_numeric,
      clinical_del6970_2PL$fitted,
      col="blue")

# make confints through logit reparametrization and plot them
predicted_3PL_logit <-  predict(clinical_del6970_3PL, scale="logit", se=T)
polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)),
        c(logit_inv(predicted_3PL_logit$pred - 1.96*predicted_3PL_logit$se),
          rev(logit_inv(predicted_3PL_logit$pred + 1.96*predicted_3PL_logit$se))),
       col=rgb(red=1, green=0, blue=0, alpha=0.1), border=NA)
predicted_2PL_logit <-  predict(clinical_del6970_2PL, scale="logit", se=T)
polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)),
        c(logit_inv(predicted_2PL_logit$pred - 1.96*predicted_2PL_logit$se),
          rev(logit_inv(predicted_2PL_logit$pred + 1.96*predicted_2PL_logit$se))),
       col=rgb(red=0, green=0, blue=1, alpha=0.1), border=NA)

# make legend
legend("bottomright", legend=c("3PL fit", "2PL fit"),
       col=c("red", "blue"), lty=c(1,1), cex=1)

# plot raw data and fits for B.1.1.7 data
plot(clinical_data_tot$days_numeric,
     clinical_data_tot$B117 /  clinical_data_tot$Tot,
     xlab="days since 2020-12-07",
     ylab="frac. B.1.1.7 clinical samples", 
     main="fit on Swiss B.1.1.7 data")
lines(clinical_data_tot$days_numeric,
      clinical_B117_3PL$fitted,
      col="red")
lines(clinical_data_tot$days_numeric,
      clinical_B117_2PL$fitted,
      col="blue")

# make confints through logit reparametrization and plot them
predicted_3PL_logit <-  predict(clinical_B117_3PL, scale="logit", se=T)
polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)),
        c(logit_inv(predicted_3PL_logit$pred - 1.96*predicted_3PL_logit$se),
          rev(logit_inv(predicted_3PL_logit$pred + 1.96*predicted_3PL_logit$se))),
       col=rgb(red=1, green=0, blue=0, alpha=0.1), border=NA)
predicted_2PL_logit <-  predict(clinical_B117_2PL, scale="logit", se=T)
polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)),
        c(logit_inv(predicted_2PL_logit$pred - 1.96*predicted_2PL_logit$se),
          rev(logit_inv(predicted_2PL_logit$pred + 1.96*predicted_2PL_logit$se))),
       col=rgb(red=0, green=0, blue=1, alpha=0.1), border=NA)

# make legend
legend("bottomright", legend=c("3PL fit", "2PL fit"),
       col=c("red", "blue"), lty=c(1,1), cex=1)

Test 3PL vs 2PL

We can test if the lower asymptote parameter is statistically different from zero, and if it should be included.

Compute a quasi log-likelihood ratio test for the fits on HV69-70 deletions:

# compute quasi log-likelihood ratio, with inflation factor from 2PL (more conservative)
llr <- -2*((-1*clinical_del6970_3PL$likelihood/clinical_del6970_2PL$infl_factor) - (-1*clinical_del6970_2PL$likelihood/clinical_del6970_2PL$infl_factor))
# chisquare test of the log-likelihood ratio
print(c("2logLr:"=llr, "p-val:"=formatC(1-pchisq(llr, 1), format = "e", digits = 2)) )

Compute a quasi log-likelihood ratio test for the fits on B.1.1.7 fractions:

# compute quasi log-likelihood ratio, with inflation factor from 2PL (more conservative)
llr <- -2*((-1*clinical_B117_3PL$likelihood/clinical_B117_2PL$infl_factor) - (-1*clinical_B117_2PL$likelihood/clinical_B117_2PL$infl_factor))
# chisquare test of the log-likelihood ratio
print(c("2logLr:"=llr, "p-val:"=formatC(1-pchisq(llr, 1), format = "e", digits = 2)) )

Of course, we see that including a lower asymptote (background prevalence parameter) is of no use in the model fitted on B.1.1.7 cases. On the contrary, the test shows that this third parameter needs to be included for estimating the growth rate based on the HV69-70 deletion data.

We see that to estimate the growth rate based on the HV69-70 deletion data, we really need to incorporate the background prevalence in the model to obtain an estimate in line with the one obtain from the B.1.1.7 fractions:

par(mfrow=c(1,1))
rate_estimates <- Reduce(function(...) rbind(...), 
                         lapply(list(clinical_B117_2PL, clinical_del6970_2PL, clinical_del6970_3PL),
                                function(x) confint(x)["a", ]))
rate_estimates <- as.data.frame(rate_estimates)

plot(c(1,2,3), 
     rate_estimates$estimate,
     ylim = c(min(rate_estimates$lower), max(rate_estimates$upper)),
     ylab = "growth rate estimate",
     xlab = "model",
     xaxt = "n",
     main = "comparison of growth rate estimates\nbetween models")
axis(1, at=1:3, labels=c("B.1.1.7 2PL", "HV69-70del 2PL", "HV69-70del 3PL"))
arrows(1:3, 
       rate_estimates$lower, 
       1:3, 
       rate_estimates$upper,
       length=0.05, angle=90, code=3)

Inspect overdispersion:

It looks like a constant overdispersion factor assumption is not violated, and that we don't want a beta-binomial distribution which would imply a linear dependency between overdispersion and sample size.

plot(clinical_del6970_3PL$weights, log10(clinical_del6970_3PL$overdispersion),
    xlab="sample size", 
    ylab="log10 variance ratio", 
    main="overdispersion vs sample size")
for (i in 1:100){
    indxs <- sample(1:length(clinical_del6970_3PL$weights), replace=T, prob=clinical_del6970_3PL$weights)
    lines(loess.smooth(clinical_del6970_3PL$weights[indxs], log10(clinical_del6970_3PL$overdispersion)[indxs]),
          col=rgb(red=1, green=0, blue=0, alpha=0.1))
}
lines(loess.smooth(clinical_del6970_3PL$weights, log10(clinical_del6970_3PL$overdispersion)), col="red")
points(clinical_del6970_3PL$weights, log10(clinical_del6970_3PL$overdispersion))


cbg-ethz/WWdPCR documentation built on Jan. 23, 2022, 2:45 p.m.