knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this document, we show how following Caduff et al. 2021 (https://doi.org/10.1101/2021.08.22.21262024) we can use this package to compute logistic growth rate on binomial time series data, as well as in the presence of background prevalence.
library(WWdPCR)
Simulate data from a three-parametric (3PL) logistic model, with lower asymptote (background prevalence) at 0.1, and binomial distribution of the data. There are 31 timepoints, each with 20 samples. The form of the 3PL is:
$$ r_{m u t}(t)=c+(1-c) \frac{e^{a\left(t-t_{0}\right)}}{1+e^{a\left(t-t_{0}\right)}} $$
set.seed(43) N <- 30 xx1 <- 0:N yy1 <- pl3model(xx1, 0.5, 12, 0.1) yy2 <- rbinom(N+1, 20, yy1) / 20
Fit a 2PL model (i.e. "usual" logistic growth) with binomial noise and inspect the results. We see that the growth rate is vastly underestimated.
fit_2PL <- fit_pl(yy2, # response vector of fractions xx1, # timepoints vector 20, # total counts, here is constant but can be a vector if the counts are different at each timepoint fitfun = pl2model, # we want to fit using the 2PL funtion likelihood = loglik_binom_n, # we are fitting binomial data quasi = T, # use a quasilikelihood model to account for possible overdispersion starting_values = c(a = 0.4, b = 10), # starting values for optimization of parameters lower = c(-Inf, -Inf), upper = c(Inf, Inf)) # boundaries for the parameters, just need to make sense coef(fit_2PL) confint(fit_2PL)
Fit a 3PL model with binomial noise and inspect the results. We see that the growth rate estimate is on point.
fit_3PL <- fit_pl(yy2, # response vector of fractions xx1, # timepoints vector 20, # total counts, here is constant but can be a vector if the counts are different at each timepoint fitfun = pl3model, # we want to fit using the 3PL funtion likelihood = loglik_binom_n, # we are fitting binomial data quasi = T, # use a quasilikelihood model to account for possible overdispersion starting_values = c(a = 0.4, b = 10, c = 0.1), # we need a good guess for the starting values lower = c(-Inf, -Inf, 0), upper = c(Inf, Inf, 1)) # boundaries for the parameters, just need to make sense coef(fit_3PL) confint(fit_3PL)
We can use the $\verb?fitted?$ method for easy plotting. We see that, as expected, the fit using a "usual" logistic model looks very wrong. The 3PL fit looks alright though.
plot(xx1, yy2) lines(xx1, yy1) lines(xx1, fitted(fit_3PL), col="red") lines(xx1, fitted(fit_2PL), col="blue") legend("bottomright", legend=c("true", "3PL fit", "2PL fit"), col=c("black", "red", "blue"), lty=c(1,1,1), cex=1)
We can predict, and compute standard errors either in the original linear scale, or in the logit scale. Computing standard errors in the logit scale allows for computing confidence intervals that, when back transformed to the linear scale, are guaranteed to be restricted to the [0,1] range (and generally should have better coverage). Here the different is not flagrant.
predicted <- predict(fit_3PL, scale="linear", se=TRUE) #returns a list with "pred" and "se" entries plot(xx1, yy2) lines(xx1, predicted$pred, col="black") #plot confints calculated from linear space se lines(xx1, predicted$pred - 1.96*predicted$se, lty=2, col="red") lines(xx1, predicted$pred + 1.96*predicted$se, lty=2, col="red") #plot confints calculated from logit space se predicted_logit <- predict(fit_3PL, scale="logit", se=T) lines(xx1, logit_inv(predicted_logit$pred - 1.96*predicted_logit$se), lty=2, col="blue") lines(xx1, logit_inv(predicted_logit$pred + 1.96*predicted_logit$se), lty=2, col="blue") legend("bottomright", legend=c("3PL fit", "linear confint", "logit confint"), col=c("black", "red", "blue"), lty=c(1,2,2), cex=1)
We can compute a quasi log-likelihood ratio test, to test whether the lower asymptote (background prevalence) is significantly different from zero:
# compute quasi log-likelihood ratio, with inflation factor from 2PL (more conservative) llr <- -2*((-1*fit_3PL$likelihood/fit_2PL$infl_factor) - (-1*fit_2PL$likelihood/fit_2PL$infl_factor)) # chisquare test of the log-likelihood ratio print(c("2logLr:"=llr, "p-val:"=formatC(1-pchisq(llr, 1), format = "e", digits = 2)) )
We have daily prevalence in the Swiss clinical sequencing of the HV69-70 deletion, ORF1a deletion associated with B.1.1.7, as well as the fraction of B.1.1.7. They were downloaded from Cov-Spectrum:
str(clinical_data) # let's aggregate them by date for all "divisions" (Cantons) clinical_data_tot <- aggregate(. ~ date, data=clinical_data, sum) clinical_data_tot <- clinical_data_tot[(as.Date(clinical_data_tot$date) >= as.Date("2020-12-07")) & (as.Date(clinical_data_tot$date) <= as.Date("2021-03-26")), ] clinical_data_tot$days_numeric <- as.numeric(as.Date(clinical_data_tot$date)-as.Date("2020-12-07"))
Fit a 2PL model to the HV69-70 deletion prevalence.
clinical_del6970_2PL <- fit_pl(clinical_data_tot$HV6970del / clinical_data_tot$Tot, clinical_data_tot$days_numeric, clinical_data_tot$Tot, fitfun = pl2model, quasi=T, starting_values = c("a"=0.1, "b"=50), lower=c(0, -Inf, 0), upper=c(1, Inf, 1)) confint(clinical_del6970_2PL)
Fit a 3PL model:
clinical_del6970_3PL <- fit_pl(clinical_data_tot$HV6970del / clinical_data_tot$Tot, clinical_data_tot$days_numeric, clinical_data_tot$Tot, likelihood = loglik_binom_n, fitfun = pl3model, quasi=T, starting_values = c("a"=0.05, "b"=50, "c"=0.01), lower=c(0, -Inf, 0), upper=c(1, Inf, 1)) confint(clinical_del6970_3PL)
Fit a 2PL model to the B.1.1.7 prevalence.
clinical_B117_2PL <- fit_pl(clinical_data_tot$B117 / clinical_data_tot$Tot, clinical_data_tot$days_numeric, clinical_data_tot$Tot, fitfun = pl2model, quasi=T, starting_values = c("a"=0.1, "b"=50), lower=c(0, -Inf, 0), upper=c(1, Inf, 1)) confint(clinical_B117_2PL)
Fit a 3PL model:
clinical_B117_3PL <- fit_pl(clinical_data_tot$B117 / clinical_data_tot$Tot, clinical_data_tot$days_numeric, clinical_data_tot$Tot, likelihood = loglik_binom_n, fitfun = pl3model, quasi=T, starting_values = c("a"=0.05, "b"=50, "c"=0.01), lower=c(0, -Inf, 0), upper=c(1, Inf, 1)) confint(clinical_B117_3PL)
We see on the plots that the fit are quite different if using a 3PL or a 2PL for the HV69-70 data, but that they are not different when fitting on the B.1.1.7 fractions.
par(mfrow=c(1,2)) # plot raw data and fits for HV69-70 deletion plot(clinical_data_tot$days_numeric, clinical_data_tot$HV6970del / clinical_data_tot$Tot, xlab="days since 2020-12-07", ylab="frac. HV69-70del in clinical samples", main="fit on Swiss HV69-70 data") lines(clinical_data_tot$days_numeric, clinical_del6970_3PL$fitted, col="red") lines(clinical_data_tot$days_numeric, clinical_del6970_2PL$fitted, col="blue") # make confints through logit reparametrization and plot them predicted_3PL_logit <- predict(clinical_del6970_3PL, scale="logit", se=T) polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)), c(logit_inv(predicted_3PL_logit$pred - 1.96*predicted_3PL_logit$se), rev(logit_inv(predicted_3PL_logit$pred + 1.96*predicted_3PL_logit$se))), col=rgb(red=1, green=0, blue=0, alpha=0.1), border=NA) predicted_2PL_logit <- predict(clinical_del6970_2PL, scale="logit", se=T) polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)), c(logit_inv(predicted_2PL_logit$pred - 1.96*predicted_2PL_logit$se), rev(logit_inv(predicted_2PL_logit$pred + 1.96*predicted_2PL_logit$se))), col=rgb(red=0, green=0, blue=1, alpha=0.1), border=NA) # make legend legend("bottomright", legend=c("3PL fit", "2PL fit"), col=c("red", "blue"), lty=c(1,1), cex=1) # plot raw data and fits for B.1.1.7 data plot(clinical_data_tot$days_numeric, clinical_data_tot$B117 / clinical_data_tot$Tot, xlab="days since 2020-12-07", ylab="frac. B.1.1.7 clinical samples", main="fit on Swiss B.1.1.7 data") lines(clinical_data_tot$days_numeric, clinical_B117_3PL$fitted, col="red") lines(clinical_data_tot$days_numeric, clinical_B117_2PL$fitted, col="blue") # make confints through logit reparametrization and plot them predicted_3PL_logit <- predict(clinical_B117_3PL, scale="logit", se=T) polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)), c(logit_inv(predicted_3PL_logit$pred - 1.96*predicted_3PL_logit$se), rev(logit_inv(predicted_3PL_logit$pred + 1.96*predicted_3PL_logit$se))), col=rgb(red=1, green=0, blue=0, alpha=0.1), border=NA) predicted_2PL_logit <- predict(clinical_B117_2PL, scale="logit", se=T) polygon(c(clinical_data_tot$days_numeric, rev(clinical_data_tot$days_numeric)), c(logit_inv(predicted_2PL_logit$pred - 1.96*predicted_2PL_logit$se), rev(logit_inv(predicted_2PL_logit$pred + 1.96*predicted_2PL_logit$se))), col=rgb(red=0, green=0, blue=1, alpha=0.1), border=NA) # make legend legend("bottomright", legend=c("3PL fit", "2PL fit"), col=c("red", "blue"), lty=c(1,1), cex=1)
We can test if the lower asymptote parameter is statistically different from zero, and if it should be included.
Compute a quasi log-likelihood ratio test for the fits on HV69-70 deletions:
# compute quasi log-likelihood ratio, with inflation factor from 2PL (more conservative) llr <- -2*((-1*clinical_del6970_3PL$likelihood/clinical_del6970_2PL$infl_factor) - (-1*clinical_del6970_2PL$likelihood/clinical_del6970_2PL$infl_factor)) # chisquare test of the log-likelihood ratio print(c("2logLr:"=llr, "p-val:"=formatC(1-pchisq(llr, 1), format = "e", digits = 2)) )
Compute a quasi log-likelihood ratio test for the fits on B.1.1.7 fractions:
# compute quasi log-likelihood ratio, with inflation factor from 2PL (more conservative) llr <- -2*((-1*clinical_B117_3PL$likelihood/clinical_B117_2PL$infl_factor) - (-1*clinical_B117_2PL$likelihood/clinical_B117_2PL$infl_factor)) # chisquare test of the log-likelihood ratio print(c("2logLr:"=llr, "p-val:"=formatC(1-pchisq(llr, 1), format = "e", digits = 2)) )
Of course, we see that including a lower asymptote (background prevalence parameter) is of no use in the model fitted on B.1.1.7 cases. On the contrary, the test shows that this third parameter needs to be included for estimating the growth rate based on the HV69-70 deletion data.
We see that to estimate the growth rate based on the HV69-70 deletion data, we really need to incorporate the background prevalence in the model to obtain an estimate in line with the one obtain from the B.1.1.7 fractions:
par(mfrow=c(1,1)) rate_estimates <- Reduce(function(...) rbind(...), lapply(list(clinical_B117_2PL, clinical_del6970_2PL, clinical_del6970_3PL), function(x) confint(x)["a", ])) rate_estimates <- as.data.frame(rate_estimates) plot(c(1,2,3), rate_estimates$estimate, ylim = c(min(rate_estimates$lower), max(rate_estimates$upper)), ylab = "growth rate estimate", xlab = "model", xaxt = "n", main = "comparison of growth rate estimates\nbetween models") axis(1, at=1:3, labels=c("B.1.1.7 2PL", "HV69-70del 2PL", "HV69-70del 3PL")) arrows(1:3, rate_estimates$lower, 1:3, rate_estimates$upper, length=0.05, angle=90, code=3)
It looks like a constant overdispersion factor assumption is not violated, and that we don't want a beta-binomial distribution which would imply a linear dependency between overdispersion and sample size.
plot(clinical_del6970_3PL$weights, log10(clinical_del6970_3PL$overdispersion), xlab="sample size", ylab="log10 variance ratio", main="overdispersion vs sample size") for (i in 1:100){ indxs <- sample(1:length(clinical_del6970_3PL$weights), replace=T, prob=clinical_del6970_3PL$weights) lines(loess.smooth(clinical_del6970_3PL$weights[indxs], log10(clinical_del6970_3PL$overdispersion)[indxs]), col=rgb(red=1, green=0, blue=0, alpha=0.1)) } lines(loess.smooth(clinical_del6970_3PL$weights, log10(clinical_del6970_3PL$overdispersion)), col="red") points(clinical_del6970_3PL$weights, log10(clinical_del6970_3PL$overdispersion))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.