knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(knitr)
library(pander)
panderOptions('table.alignment.default', function(df) rep("left",ncol(df)))

Introduction

HIV incidence is challenging to estimate, especially when only cross-sectional survey data is available, or the population of interest is hard to reach. Method utilizing recency assays require sample sizes in the tens of thousands and are sensitive to difficult to measure parameters such as the false recency rate and mean duration of recent infection.

The methods implemented in this package allow for the estimation of incidence from a single cross-sectional survey utilizing self-reported testing history. The basic idea is fairly intuitive. Individuals begin uninfected, and become infected at a certain rate. Those new infections start out undiagnosed and become diagnosed cases at a certain rate.

HIV Cascade{ width=75% }

All things being equal:

In fact, if we know the size of the undiagnosed population are the diagnosis rate (i.e. the expected time from infection to diagnosis ($E(TID)$)), then we also know the rate of infection. This is governed by the fundamental epidemic equation $$P(U) = rE(TID),$$ where $P(U)$ is the proportion undiagnosed and $r$ is the rate of infection.

In this package we assume that individuals become diagnosed either due to regular testing, or due to symptomatic progression to AIDS, whichever comes first. The rate at which individuals engage in testing and the distribution of time between tests is estimated from individuals' self-reported last HIV test.

The proportion undiagnosed $P(U)$ is estimated using the fraction of individuals reporting never having been diagnosed before. Due to the potential for miss-reporting, viral load and art biomarkers may be used to correct for miss-reporting using the rate of miss-reporting among those who are treated cases.

Installing the Package

The development version of the package may be installed through github

# If devtools is not installed:
# install.packages("devtools")

devtools::install_github("fellstat/ShinyAsyncTools")
devtools::install_github("fellstat/TestingHistoryIncidence", dependencies="Suggests")

Basic Usage

We begin by loading a synthetic dataset included with the package.

library(TestingHistoryIncidence)
data(tstdat)

# Add age
tstdat$age <- rep(15:64, 200)

head(tstdat)
summary(tstdat)

This dataset has variables defines as:

We can now estimate incidence using the testing_incidence function

attach(tstdat)
inc <- testing_incidence(report_pos=report_pos, biomarker_art=biomarker_art, 
                         low_viral=low_viral, hiv=hiv,ever_test=ever_test,
                         last_test=last_test)
#print(inc)
#kable(inc)
pander(inc)

Here we see an estimated incidence of r round(inc[1,1]*100,2)%. Transmission rate is defined as the number of infections per infected individual and is estimated to be r round(inc[1,2]*100,2)%. The other reported quantities are:

Next, we refine the estimate of $E(TID)$ using age. Very young children often do not engage in regular HIV testing, so the observed last test is censored based on the initiation of regular testing. If individuals debut into testing at 13 years old in this population, we adjust for this censoring using the testing_debut_age parameter.

inc <- testing_incidence(report_pos=report_pos, biomarker_art=biomarker_art, 
                         low_viral=low_viral, hiv=hiv, ever_test=ever_test, 
                         last_test=last_test, age=age, testing_debut_age=13)
#print(inc)
#kable(inc)
pander(inc)

We see that in this dataset, censoring does not have a large impact. It is most important to include in cases where the population is concentrated near the debut age.

Subsetting

The testing_incidence function takes a logical subset parameter, which may be used to define sub-populations of interest (e.g. race or geographic location). If uniform_missreport is set to TRUE then the rate of miss reporting of undiagnosed status is calculated over the whole population, otherwise the calculation is restricted to the subset.

subset <- c(rep(TRUE,1000),rep(FALSE,9000))
inc1 <- testing_incidence(report_pos, hiv,ever_test, last_test,  
                          biomarker_art=biomarker_art, low_viral=low_viral,age=age, 
                          testing_debut_age=13, subset=subset)
inc2 <- testing_incidence(report_pos, hiv,
                                        ever_test, last_test, 
                          biomarker_art=biomarker_art, low_viral=low_viral, age=age, 
                                        testing_debut_age=13, subset=subset, uniform_missreport=TRUE)
inc <- rbind(inc1,inc2)
rownames(inc) <- c("restricted","uniform")
#print(inc[,c(1,2,3,10,11)])
#kable(inc)
pander(inc[,c(1,2,3,10,11)],caption=NULL, style = 'rmarkdown')

Stratifying by Age

Using the subset parameter to restrict to an age group will lead to inaccurate estimates due to the fact that individuals infected in one age group become undiagnosed cases in another group as they age. Stratifying by age is achieved using the age_breaks parameter, which is used to define the divisions between the age groups of interest.

inc <- testing_incidence(report_pos, hiv,
                                        ever_test, last_test, biomarker_art=biomarker_art, low_viral=low_viral, age=age, 
                                        testing_debut_age=13, age_breaks=c(25,35,45,55))

#print(inc[,c(1,2,3,6,9)])
#kable(inc)
pander(inc[,c(1,2,3,6,9)],caption=NULL, style = 'rmarkdown')

Here we see age broken down into 10 year bins, with incidence and transmission rate estimates in each group. $E(TID)$ is calculated separately within each age group, allowing for differential testing behavior by age. The pooled estimate is the overall incidence rate in the population aggregating the by age incidence values.

Uncertainty Estimation

Confidence intervals and standard errors are constructed using bootstrap resampling with the bootstrap_incidence function. The nrep parameter controls the number of bootstrap replicates to use.

inc <- testing_incidence(report_pos, hiv,
                                        ever_test, last_test, biomarker_art=biomarker_art, low_viral=low_viral, age=age, testing_debut_age=13)

boots <- bootstrap_incidence(inc, nrep=10)
#print(boots)
#kable(inc)
pander(summary(boots))

The reported quantities are:

Working with Complex Survey Designs

Survey weights from any sampling design may be included in the analysis through the weights parameter.

weights <- runif(10000)
inc <- testing_incidence(report_pos, hiv,
                                        ever_test, last_test, biomarker_art=biomarker_art, low_viral=low_viral, age=age, testing_debut_age=13, weights=weights)

The bootrap_incidence function supports survey bootstrap and jackknife uncertainty estimates via replication weights (rep_weights). The type of replication weights is specified by the type parameter. See help("svrrepdesign") for more information on the types of replication weights supported. Let's begin by generating 25 replication weights from a simple stratified survey design.

library(survey)
stype <- sample(1:10, 10000, replace = TRUE)
# stratified sample
dstrat<-svydesign(id=~1,strata=~stype, weights=~weights, data=tstdat)

# generate replicate design
des1 <- as.svrepdesign(dstrat, type="bootstrap", compress=FALSE, replicates=25)
rep_weights <- des1$repweights * weights # create combined replication weights
head(rep_weights)

Now we can feed these replication weights into bootrap_incidence to get our uncertainty estimates.

boots <- bootstrap_incidence(inc, rep_weights = rep_weights, type="bootstrap")
#print(boots)
#kable(inc)
pander(summary(boots))
detach(tstdat)


fellstat/TestingHistoryIncidence documentation built on May 7, 2019, 7:41 a.m.