In joebrew/maragra: An analysis of malaria at the Maragra sugar company, Mozambique

# csl: journal-of-health-economics.csl

# Packages
library(tidyverse)
library(knitr)
library(Hmisc)
library(brew)
library(maragra)
library(knitr)
library(RColorBrewer)
library(extrafont)
library(kableExtra)
library(raster)
loadfonts()
## Global options
options(max.print="75")
opts_chunk$set(echo=FALSE,
                 cache=FALSE,
               prompt=FALSE,
               fig.height = 4,
               fig.width = 5,
               tidy=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE,
               dev = "cairo_pdf",
               fig.pos="!h",
               fig.align = 'center')
opts_knit$set(width=75)
options(xtable.comment = FALSE)

source('paper_prepare.R')

\begin{center} \begin{large}

Brew

\end{large} \end{center}

\vspace{5mm}

\begin{center} \textbf{Overview}
\end{center}

\vspace{5mm} \begin{center} \begin{changemargin}{3cm}{3cm}

The administration of insecticide at a worker's residence (indoor residual spraying or IRS) likely protects that worker from malaria infection by killing the vectors (mosquitoes) that land on the building's walls. However, it is also highly likely that the protective effective of IRS "spills over" to others who live nearby. This positive spillover effect would theoretically go through two channels: (i) via a reduction of mosquitoes in the vicinity and (ii) via a reduction of the malaria parasite in the blood of humans in the vicinity (ie, the parasite "reservoir"). This document describes our method for assessing the existence and magnitude of positive spillover of IRS in the Maragra workers' data. We devise a time-specific household "protection" score based on the theoretical effectiveness of IRS, and then use that protection score to develop a time-place specific "herd protection" score based on a weighted average of nearby household protection scores. We incorporate the latter into our fixed effects model and find that...

\end{changemargin} \end{center}

\vspace{20mm}

\noindent\fbox{% \parbox{\textwidth}{% \subsection*{Justification} \begin{itemize} \item Positive spillover may occur. \item Spillover has both a space (ie, distance to sprayed house) and time (ie, time since spraying) dimension. \item If spillover occurs and is not accounted for, our models likely underestimate the true effect of IRS since our "control" group (ie, those not receiving IRS) do actually receive IRS (indirectly via spillover). \item If spillover does not occur, then this method should demonstrate its non-importance. \end{itemize} \vspace{2mm} }% }

\vfill \null

\subsection*{Desinataires} \textbf{Elisa Sicuri; Menno Pradhan}

\vspace{3mm}

\newpage

Protection score

To test whether IRS from one household has an effect on workers living in another household, we first must make some assumptions about the level and waning effect of IRS generally. It is known from the literature that IRS has an immediate effect on mosquitoes and should affect malaria risk following the incubation period, that the waning effect begins in month 4 [@Tukei2017], that IRS effects those not living in the household by reducing reproduction of mosquitoes [@White2011], and that IRS' effect after 1 year is essentially null [@Bukirwa2009].

x <- c(1:6)
y <- (c(-3.45, -6.04, - 6.53, -0.17, -2.74, 5.27) + 27.61) / 27.61
lo <- loess(y~x, span = 3)$fitted
lo[lo > 1] <- 1
lo <- 1-lo
y <- 1-y
df <- data.frame(x,y,lo)
df$months_since <- add_zero(df$x, n = 2)
df$months_since[df$months_since == '06'] <- '06+'
df <- df %>%
  bind_rows(df[nrow(df),] %>%
              mutate(months_since = 'No IRS'))
df <- df %>%
  gather(key, value, y:lo) %>%
  mutate(key = ifelse(key == 'lo', 
                      'Brew', 'Tukei'))
ggplot(data = df,
       aes(x = x,
           y = value,
           group = key,
           color = key)) +
  geom_point() +
  geom_line() +
  # geom_smooth(se = FALSE,span = 3) +
  theme_maragra() +
  labs(x = 'Months after IRS',
       y = 'Protection') +
  geom_hline(lty = 2, 
             alpha = 0.6,
             yintercept = 0) +
  scale_color_manual(name = '',
                     values = c('blue', 'orange'))


# The above chart shows two lines: the orange line is based on Tukei's emperical results regarding the protection afforded by IRS against malaria infection (among children) as a function of time since IRS; the blue line is our simplification of Tukei's work.

For our purposes, we make simplify even further. We make no assumptions about the level of positive externalities afforded by IRS (instead preferring to estimate this within our model), and we do not account for waning, instead modeling the entire "post" period of 6 months as one.\footnote{Though not as granular as estimating time-specific protection within the 6 months following IRS administration, our approach is not less accurate; it simply aggregates the entirety of the 6 months after IRS administration into one value, and assumes (as is suggested by the literature) that the period after 6 months is effectively equivalent in chemical terms as the period prior to IRS.}

Distance-based weighting

At any given time, each household is either (a) protected (ie, < 6 months following IRS administration) or (b) unprotected. If the former, they confer protection to their neighbors; if the latter, they confer no protection to their neighbors. Having assigned each household's protection status at all times, we need to define a method for how much of any given household's protection level is passed on. A household which is 1 meter from a neighbor should, theoretically, afford more protection to that household than another which is 1000 meters. To account for the effect of distance, we define a simple function for weighting a nearby residence's contribution to a household's "herd protection" score as 1 divided by the distance, as follows:

weighter <- function(x){
   1/x
}

Where x is the distance (in kilometers) to the household whose "herd protection" score is being estimated. In other words, a household's "herd protection" score is the mean of all households' protection scores, weighted by the inverse of the distance away.

Functionally, the weights look like this:

x <- seq(0, 1, length = 100)
y <- weighter(x)
x <- data_frame(x,y)
ggplot(data = x,
       aes(x = x,
           y = y)) +
  geom_line() +
  labs(x = 'Km from house',
       y = 'Weight for herd protection score') +
  theme_maragra() +
  geom_area(fill = 'lightblue',
            alpha = 0.6)

We heavily weight towards closer houses to account for the relatively small travel distances that most mosquitoes fly in normal conditions [@Verdonschot2014].

Density

We consider a house's "herd" protection level (ie, the protection conferred to the house through externality) to be the average of the other houses' non-herd protection levels (ie, 0 if not protected and 1 if protected), weighted by the distance to the house in question. The below is an illustration of a house (the red x) and the neighboring houses (circles). The "weight" of each house is indicated by the circle size, and the protection level of the house is indicated by the shading of the circle.

make_data <- function(n = 5){
  df <- data.frame(id = letters[1:n],
                 x = rnorm(n = n),
                 y = rnorm(n = n),
                 protection = round(sample(0:1, n, replace = TRUE), digits = 1),
                 color = c('red', rep('grey', n-1))) %>%
  mutate(lng = x,
         lat = y)
  df$x[1] <- df$lng[1] <- df$lat[1] <- df$y[1] <- 0
library(sp)
coordinates(df) <- ~x+y
df$distance <- round(spDists(x = df)[1,], digits = 3) / 1
df$weight <- weighter(df$distance)
df$herd_protection <- df$protection * df$weight
return(df)
}

df <- make_data(n = 10)
make_plot <- function(df){
  g <- 
  ggplot(data = df@data %>% mutate(protection = factor(protection)) %>%
           mutate(Weight = weight),
       aes(x = lng,
           y = lat)) +
  geom_point(aes(size = Weight,
                 color = protection),
             alpha = 0.6) +
  geom_point(data = df@data[1,],
             color = 'blue',
             pch = 4,
             cex = 3) +
  theme_maragra() +
  labs(x = '',
       y = '') +
  # theme(legend.position = 'right') +
    scale_color_manual(name = 'Protection',
                       values = c('red', 'darkgreen')) +
  # guides(color = guide_colourbar(title = 'Protection', title.position = 'top'),
  #        size = guide_legend(nrow = 5, title.position = 'top', nbin = 2)) +
  # scale_color_continuous(low = 'orange', high = 'blue') +
    xlim(-2,2) +
    ylim(-2,2)
  for(i in seq(10, 150, length = 10)){
    g <- g + 
      geom_point(data = df@data[1,],
             color = 'black',
             pch = 1,
             cex = i,
             alpha = 0.4) 
  }
  for(i in 2:nrow(df@data)){
    x <- bind_rows(df@data[i,],
                   df@data[1,])
    g <- g + geom_line(data = x,
                        aes(x = lng,
                            y = lat),
                        alpha = 0.6,
                        color = 'blue',
                       lty = 2,
                       size = 0.3)
  }
  return(g)
}
make_plot(df = df)

However, this consideration misses one important dimension: density. The below house's weighted average protection score is identical to that of the above house. In other words, at the time in question, at both locations, the percentage of houses with IRS coverage is the same, and the (weighted) average time since IRS is the same. However, the below house is likely much more protected than the above, given the absolute number of nearby IRS-protected houses.

df <- make_data(n = 100)
make_plot(df = df)

In the latter representation, the likelihood of a mosquito landing on a DDT-affected surface is much higher than the former, even though the relative IRS coverage for the neighborhood is identical. In order to account for this, we must take density itself into account, ie the number of houses (and their protection level and proximity) within 1 kilometer.

Our approach is this: rather than using a weighted average of each household's individual protection score, we instead use a weighted sum. A particular household's herd protection score at any given time is the sum of all households' individual protection scores at that time multiplied by those households' distance weights.

Combining them all together

The summation of (a) the IRS level of other houses at a certain time, (b) weighted by the distance of those houses to the house in question, (c) limited to only the 1 km radius multiplied yields a "herd protection score".

This score is conceptually similar to Cohen and Dupas' quantification of the positive externalities of ITN (bednet) use in Kenya [@Cohen2010] in that it attempts to estimate the protection conferred to "non-users" by users and uses a "weighted sum" (as opposed to average). Our approach differs in that the time dimension for IRS is much more important than ITN (ie, IRS is not a binary but rather a waning function, as described previously), requiring us to create herd scores for every day. This is because the protection conferred to a house by its neighbors may change from one day to the next, when a neighbor either (a) gets sprayed or (b) leaves "protected" status (ie, due to the end of the 6 months window).

Our approach can be justified in that previous studies have found strong positive health externalities in malaria interventions related to ITN coverage [@Alaii2003FactorsAU]. No studies exist on positive externalities for IRS coverage, but to the extent that the mechanisms for the reduction in infection are similar (reduction in the natural reservoir of the disease, reduction in the number of vectors, etc.), it is reasonable to assume similar effects. Additionally, we using weighted distance for our score calculation (rather than simply defining a radius threshold) because of previous studies' findings that there is a linear decline in protection conferred to others with greater distance from an intervention [@Binka2007]

Incorporating into data and model

We calculate a "herd protection" score for each location-date combination (r nrow(model_data) values). Whereas our original model specification looked like:

$$ \hat{Y_{it}} = \hat{\beta}{0} + \hat{\beta}{1}\text{Season}{t} * (\hat{\beta}_2{IRS{it}}*\hat{\beta}3{IRStime{it}}) + \alpha_i + \delta_t + \upsilon_{it} $$

Our new specification incorporates the herd protection value:

$$ \hat{Y_{it}} = \hat{\beta}{0} + \hat{\beta}{1}\text{Season}{t} * (\hat{\beta}_2{IRS{it}}*\hat{\beta}3{IRStime{it}}) + \hat{\beta}4{Herd{it}} + \alpha_i + \delta_t + \upsilon_{it} $$

$\hat{Y}{it}$ is the rate of absence. $\beta{1}$ is the binary "season" variable, imputed from overall district clinical incidence. Our intervention is whether the residence of the worker in question was treated in the last year, and, if so, the time since treatment, represented, respectively, by $\beta_{2}$ and $\beta_{3}$. $\beta_{4}$ represents the distance-weighted herd protection score. $\alpha_i$ represents the time invariant worker fixed effects, and $\delta_i$ represents the fixed effect of the particular malaria season. $\upsilon$ is the error term.

\newpage

Results

Table

make_models_table(model_list = protection_models, the_caption = "All absenteeism with herd immunity: model results")

Spatial protection surfaces

The below shows worker locations and the average herd protection score for the entire study period.

# Fortify bairros
library(maptools)
library(rgeos)
border <- gConvexHull(bairros_maragra_bairro)
border <- SpatialPolygonsDataFrame(Sr = border,
                                   data = data.frame(a = 1))
# border <- unionSpatialPolygons(bairros_maragra_bairro, 
#                           IDs = rep(1, nrow(bairros_maragra_bairro)),
#                           threshold = 9^9)
border_fortified <- 
  broom::tidy(border,
              regions = 'Id')

# Create average herd risk score
x <- model_data %>%
  group_by(lng = longitude_aura,
           lat = latitude_aura) %>%
  summarise(herd = mean(herd, na.rm = TRUE)) %>%
  ungroup %>%
  filter(!is.na(lng))
x_sp <- x
coordinates(x_sp) <- ~lng+lat   
proj4string(x_sp) <- proj4string(border)
overs <- sp::over(x_sp, border)
x <- x[!is.na(overs),]
ggplot() +
  geom_point(data = x,
       aes(x = lng,
           y = lat,
           color = herd)) +
  geom_polygon(data = border_fortified,
               aes(x = long,
                   y = lat,
                   group = group),
               fill = NA,
               color = 'black') +
  ggthemes::theme_map() +
  scale_color_continuous(name = 'Herd\nprotection\nscore',
                         low = 'orange',
                         high = 'blue')

The below uses kernel density interpolation to estimate a generalized protection surface over the entire facility.

# Create a gridded dataframe with values 
# for the entire range (bbox) of border
df_grid <- expand.grid(lng = seq(bbox(border)[1,1],
                                 bbox(border)[1,2],
                                 by = 0.0008),
                       lat = seq(bbox(border)[2,1],
                                 bbox(border)[2,2],
                                 by = 0.0008),
                       herd = NA)
df_grid$latitude <- df_grid$lat
df_grid$longitude <- df_grid$lng
coordinates(df_grid) <- ~longitude+latitude

# Go through each row of df_grid, getting the 
# weighted mean irs score for that point
# and putting a color into df_grid
if('df_grid.RData' %in% dir()){
  load('df_grid.RData')
} else {
  for (i in 1:nrow(df_grid)){
  # Get distance from every point in x_sp
  distances <- spDistsN1(pts = x_sp,
                        pt = df_grid[i,],
                        longlat = TRUE)
  # Define which are acceptably close
  close_enough <- which(distances <= 50)
  # Get an herd score
  herd <- stats::weighted.mean(x = x_sp$herd[close_enough],
                       w = (1 / distances[close_enough]) ^2,
                       na.rm = TRUE)
  # Assign irs to the dataframe
  df_grid$herd[i] <- herd

}
save('df_grid', file = 'df_grid.RData')
}
# Convert df_grid to raster
temp <- df_grid@data %>% arrange(lng, lat)
r <- rasterFromXYZ(temp[, c('lng', 'lat', 'herd')])
plot(r)

proj4string(df_grid) <- proj4string(border)
x <- over(df_grid, polygons(border))
df_grid_small <- df_grid[!is.na(x),]
temp <- df_grid_small@data %>% arrange(lng, lat)
r <- rasterFromXYZ(temp[, c('lng', 'lat', 'herd')])

plot(border)
plot(r, add = TRUE)
plot(border, add = TRUE)

Conclusion

Using the approach laid out here, there appears to be a significant protective effect afforded by the IRS of neighbors. If this approach is deemed correct, then next steps are determining:

How to translate the effect into costs and savings (simulation based approach at various IRS coverage levels?).
How / where to report on this in the paper (is this a "method" or "result"?).

\newpage

Appendix

Theoretical protection surface

The below shows a theoretical protection surface assuming (a) uniform distribution of residences, (b) random distribution of IRS over time, but never going beyond 6 months between sprays.

theoretical <- function(n){
  x <- lng <- runif(n = n, min = -1, max = 1)
y <- lat <- runif(n = n, min = -1, max = 1)
data <- data.frame(id = 1:n,
                   x,y, lng, lat,protection = sample(0:1, n,
                                                     replace = TRUE))
coordinates(data) <- ~x+y

# Loop through and get scores
out_list <- list()
counter <- 0
distances <- spDists(data)
for(j in 1:nrow(data)){
      sub_distances <- distances[j,]
      x <- data$protection
      w <- weighter(sub_distances)
      x <- x[is.finite(w)]

      # The weight
      w <- w[is.finite(w)]
      # Number of houses within 1k
      n <- length(which(sub_distances <= 1))
      # The weighted protection score (average)
      positivity <- stats::weighted.mean(x = x,
                                         w = w,
                                         na.rm = TRUE)
      # The weighted additive protection score
      s <- sum(x[sub_distances <= 1] * w[sub_distances <= 1], na.rm = TRUE)
      counter <- counter + 1

      # Update model_data
      out_data <- data_frame(id = data$id[j],
                             herd = positivity,
                             n = n,
                             s = s)
      out_list[[counter]] <- out_data
    }

x <- bind_rows(out_list)
x <- x %>%
    group_by(id) %>%
    summarise(herd = mean(herd, na.rm = TRUE),
              n = mean(n, na.rm = TRUE),
              s = mean(s, na.rm = TRUE))

data@data <- left_join(data@data, x)

border <- gConvexHull(data)
border <- SpatialPolygonsDataFrame(Sr = border,
                                   data = data.frame(a = 1))
border_fortified <- broom::tidy(border, regions = 1)

a1 <- ggplot() +
  geom_polygon(data = border_fortified,
               aes(x = long,
                   y = lat,
                   group = group),
               fill = NA,
               alpha = 0.2,
               color = 'black') +
  geom_point(data = data@data,
             aes(x = lng,
                 y = lat,
                 color = protection)) +
  theme_maragra() +
  scale_color_continuous(name = 'Individual\nprotection',
                         low = 'orange',
                         high = 'green',
                         guide = 'legend') +
  labs(x = '',
       y = '')
a2 <- ggplot() +
  geom_polygon(data = border_fortified,
               aes(x = long,
                   y = lat,
                   group = group),
               fill = NA,
               alpha = 0.2,
               color = 'black') +
  geom_point(data = data@data,
             aes(x = lng,
                 y = lat,
                 color = s)) +
  theme_maragra() +
  scale_color_continuous(name = 'Herd\nprotection',
                         low = 'orange',
                         high = 'green',
                         guide = 'legend') +
  labs(x = '',
       y = '')
Rmisc::multiplot(a1, a2, cols = 2)
}

100 workers

theoretical(n = 100)

1000 workers

theoretical(n = 1000)

3000 workers

theoretical(n = 3000)

References

joebrew/maragra documentation built on Aug. 11, 2020, 8:39 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

joebrew/maragra
An analysis of malaria at the Maragra sugar company, Mozambique

In joebrew/maragra: An analysis of malaria at the Maragra sugar company, Mozambique

Protection score

Distance-based weighting

Density

Combining them all together

Incorporating into data and model

Results

Table

Spatial protection surfaces

Conclusion

Appendix

Theoretical protection surface

100 workers

1000 workers

3000 workers

References

R Package Documentation

Browse R Packages

We want your feedback!

joebrew/maragra An analysis of malaria at the Maragra sugar company, Mozambique

In joebrew/maragra: An analysis of malaria at the Maragra sugar company, Mozambique

Protection score

Distance-based weighting

Density

Combining them all together

Incorporating into data and model

Results

Table

Spatial protection surfaces

Conclusion

Appendix

Theoretical protection surface

100 workers

1000 workers

3000 workers

References

R Package Documentation

Browse R Packages

We want your feedback!

joebrew/maragra
An analysis of malaria at the Maragra sugar company, Mozambique