pspline.inference: Inference using penalized basis splines (P-splines) in a...

Description Details Author(s) Examples

Description

This package lets you make point and interval estimates of outcomes modeled with a non-linear P-spline GAM.

Details

Applications in infectious disease outbreak modeling include estimating of outbreak onset, peak, or offset, as well as outbreak cumulative incidence over time.

The package can model two types of outcomes: scalar outcomes, which are single-value outcome measures (for example, timing of outbreak peak) and time series characteristics, which are functions of time (for example, infection incidence over time)

For each outcome measure, the package produces median and confidence interval estimates.

Typical use of this package begins by using the package mgcv to obtain a GAM/GAMM model of the process under investigation (such as an infectious disease outbreak), followed by calling either pspline.estimate.scalars or pspline.estimate.timeseries to obtain confidence intervals on the desired outcome measure

Both pspline.estimate.scalars and pspline.estimate.timeseries allow computation of arbitrary outcome measures, by passing a function that calculates the desired outcome measure into pspline.estimate.scalars or pspline.estimate.timeseries.

For convenience, this package also includes several utilities specifically aimed at modeling of infectious disease outbreaks, such as pspline.outbreak.cases and pspline.outbreak.cumcases (for estimation of incidence and cumulative incidence), and pspline.outbreak.thresholds, for estimation of outbreak onset and offset.

Author(s)

Ben Artin ben@artins.org

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Simulate an outbreak for analysis
cases = data.frame(
  time=seq(0, 51),
  cases=rpois(52, c(rep(1, 13), seq(1, 50, length.out=13), seq(50, 1, length.out=13), rep(1, 13)))
)

# Generate GAM model for outbreak; see mgcv for details
library(mgcv)
model = gam(cases ~ s(time, k=10, bs="cp", m=3), family=poisson, data=cases)

# Generate time series at which model will be evaluated for estimates
# Usually you want this to be the same as the time interval that your observations are in, except
# divided into small increments (here, eps). Using a smaller eps gives more accurate estimates, 
# but takes longer to run. A value smaller than 0.5 would be better for final analysis
eps = 0.5
estTimes = data.frame(time=seq(min(cases$time) - 0.5, max(cases$time) + 0.5 - eps, by=eps))

# Estimate incidence
estCases = pspline.estimate.timeseries(
  model, estTimes,
  pspline.outbreak.cases,
  # Using a large number of samples makes the analysis more robust; 
  # using only 15 samples makes this example run fast (default is 2000)
  samples=15, 
  level=.95
)

# Estimate time when outbreak crosses 5\% and 95\% of cumulative case count
onsetThreshold = 0.025
offsetThreshold = 1 - onsetThreshold
thresholds = pspline.estimate.scalars(
  model, estTimes,
  pspline.outbreak.thresholds(onset=onsetThreshold, offset=offsetThreshold), 
  # Using a large number of samples makes the analysis more robust; 
  # using only 15 samples makes this example run fast (default is 2000)
  samples=15, 
  level=.95
)

# Plot cumulative incidence estimates and threshold estimates
library(ggplot2)
ggplot() +
  geom_ribbon(data=estCases, aes(x=time, ymin=cases.lower, ymax=cases.upper), fill=grey(.75)) +
  geom_line(data=estCases, aes(x=time, y=cases.median)) +
  geom_point(data=cases, aes(x=time, y=cases)) +
  annotate("rect",
    xmin=thresholds$onset.lower,
    xmax=thresholds$onset.upper,
    ymin=-Inf, ymax=Inf, alpha=.25) +
  annotate("rect",
    xmin=thresholds$offset.lower,
    xmax=thresholds$offset.upper,
    ymin=-Inf, ymax=Inf, alpha=.25) +
 labs(x="Time", y="Incidence")

pspline.inference documentation built on Jan. 19, 2021, 5:07 p.m.