hhh4: Fitting HHH Models with Random Effects and Neighbourhood...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Fits a Poisson or negative binomial model with conditional mean

μ_it = λ_it y_i,t-1 + φ_it sum_(j != i) w_ji y_j,t-1 + e_it ν_it

containing epidemic and endemic components to a multivariate time series of counts Y_it (unit i, period t). Univariate count time series are supported as well. In the case of a negative binomial model, the conditional variance is μ_it(1+ψ*μ_it) with overdispersion parameter ψ. The three unknown quantities of the mean μ_it,

are log-linear predictors incorporating time-/unit-specific covariates. They may contain unit-specific random intercepts (ri) as proposed by Paul and Held (2011). The e_it is a (multiplicative) endemic offset; it is also possible to include such offsets in the epidemic components. The w_ji are neighbourhood weights, which can either be prespecified or estimated parametrically as proposed by Meyer and Held (2014).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
hhh4(stsObj,
     control = list(
         ar = list(f = ~ -1, offset = 1, lag = 1),
         ne = list(f = ~ -1, offset = 1, lag = 1,
                   weights = neighbourhood(stsObj) == 1,
                   scale = NULL, normalize = FALSE),
         end = list(f = ~ 1, offset = 1),
         family = c("Poisson", "NegBin1", "NegBinM"),
         subset = 2:nrow(stsObj),
         optimizer = list(stop = list(tol=1e-5, niter=100),
                          regression = list(method="nlminb"),
                          variance = list(method="nlminb")),
         verbose = FALSE,
         start = list(fixed=NULL, random=NULL, sd.corr=NULL),
         data = list(t = stsObj@epoch - min(stsObj@epoch)),
         keep.terms = FALSE
     ),
     check.analyticals = FALSE)

Arguments

stsObj

object of class "sts" containing the (multivariate) count data time series.

control

a list containing the model specification and control arguments:

ar

Model for the autoregressive component given as list with the following components:

f = ~ -1

a formula specifying log(λ_it)

offset = 1

optional multiplicative offset, either 1 or a matrix of the same dimension as observed(stsObj)

lag = 1

a positive integer meaning autoregression on y_{i,t-lag}

ne

Model for the neighbor-driven component given as list with the following components:

f = ~ -1

a formula specifying log(φ_it)

offset = 1

optional multiplicative offset, either 1 or a matrix of the same dimension as observed(stsObj)

lag = 1

a non-negative integer meaning dependency on y_{j,t-lag}

weights = neighbourhood(stsObj) == 1

neighbourhood weights w_ji. The default corresponds to the original formulation by Held et al (2005), i.e., the spatio-temporal component incorporates an unweighted sum over the lagged cases of the first-order neighbours. See Paul et al (2008) and Meyer and Held (2014) for alternative specifications, e.g., W_powerlaw. Time-varying weights are possible by specifying an array of dim() c(nUnits, nUnits, nTime), where nUnits=ncol(stsObj) and nTime=nrow(stsObj).

scale = NULL

optional matrix of the same dimensions as weights (or a vector of length ncol(stsObj)) to scale the weights to scale * weights.

normalize = FALSE

logical indicating if the (scaled) weights should be normalized such that each row sums to 1.

end

Model for the endemic component given as list with the following components

f = ~ 1

a formula specifying log(ν_it)

offset = 1

optional multiplicative offset e_it, either 1 or a matrix of the same dimension as observed(stsObj)

family

Distributional family – either "Poisson", or the Negative Binomial distribution. For the latter, the overdispersion parameter can be assumed to be the same for all units ("NegBin1"), to vary freely over all units ("NegBinM"), or to be shared by some units (specified by a factor of length ncol(stsObj) such that its number of levels determines the number of overdispersion parameters). Note that "NegBinM" is equivalent to factor(colnames(stsObj), levels = colnames(stsObj)).

subset

Typically 2:nrow(obs) if model contains autoregression

optimizer

a list of three lists of control arguments.

The "stop" list specifies two criteria for the outer optimization of regression and variance parameters: the relative tolerance for parameter change using the criterion max(abs(x[i+1]-x[i])) / max(abs(x[i])), and the maximum number niter of outer iterations.

Control arguments for the single optimizers are specified in the lists named "regression" and "variance". method="nlminb" is the default optimizer for both (taking advantage of the analytical Fisher information matrices), however, the methods from optim may also be specified (as well as "nlm" but that one is not recommended here). Especially for the variance updates, Nelder-Mead optimization (method="Nelder-Mead") is an attractive alternative. All other elements of these two lists are passed as control arguments to the chosen method, e.g., if method="nlminb" adding iter.max=50 increases the maximum number of inner iterations from 20 (default) to 50.

verbose

non-negative integer (usually in the range 0:3) specifying the amount of tracing information to be output during optimization.

start

a list of initial parameter values replacing initial values set via fe and ri. Since surveillance 1.8-2, named vectors are matched against the coefficient names in the model (where unmatched start values are silently ignored), and need not be complete, e.g., start = list(fixed = c("-log(overdisp)" = 0.5)) (default: 2) for a family = "NegBin1" model. In contrast, an unnamed start vector must specify the full set of parameters as used by the model.

data

a named list of covariates that are to be included as fixed effects (see fe) in any of the 3 component formulae. By default, the time variable t is available and used for seasonal effects created by addSeason2formula. In general, covariates in this list can be either vectors of length nrow(stsObj) interpreted as time-varying but common across all units, or matrices of the same dimension as the disease counts observed(stsObj).

keep.terms

logical indicating if the terms object used in the fit is to be kept as part of the returned object. This is usually not necessary, since the terms object is reconstructed by the terms-method for class "hhh4" if necessary (based on stsObj and control, which are both part of the returned "hhh4" object).

check.analyticals

logical (or a subset of c("numDeriv", "maxLik")), indicating if (how) the implemented analytical score vector and Fisher information matrix should be checked against numerical derivatives at the parameter starting values, using the packages numDeriv and/or maxLik. If activated, hhh4 will return a list containing the analytical and numerical derivatives for comparison (no ML estimation will be performed). This is mainly intended for internal use by the package developers.

Details

For further details see vignette("hhh4") and the references.

Value

hhh4 returns an object of class "hhh4", which is a list containing the following components:

coefficients

named vector with estimated (regression) parameters of the model

se

estimated standard errors (for regression parameters)

cov

covariance matrix (for regression parameters)

Sigma

estimated variance-covariance matrix of random effects

Sigma.orig

estimated variance parameters on internal scale used for optimization

Sigma.cov

inverse of marginal Fisher information (on internal scale), i.e., the asymptotic covariance matrix of Sigma.orig

call

the matched call

dim

vector with number of fixed and random effects in the model

loglikelihood

(penalized) loglikelihood evaluated at the MLE

margll

(approximate) log marginal likelihood should the model contain random effects

convergence

logical. Did optimizer converge?

fitted.values

fitted mean values μ_it

control

control object of the fit

terms

the terms object used in the fit if keep.terms = TRUE and NULL otherwise

stsObj

the supplied stsObj

lags

named integer vector of length two containing the lags used for the epidemic components "ar" and "ne", respectively. The corresponding lag is NA if the component was not included in the model.

nObs

number of observations used for fitting the model

nTime

number of time points used for fitting the model

nUnit

number of units (e.g. areas) used for fitting the model

runtime

the proc.time-queried time taken to fit the model, i.e., a named numeric vector of length 5 of class "proc_time"

Author(s)

M. Paul, S. Meyer, and L. Held

References

Held, L., Höhle, M., Hofmann, M. (2005): A statistical framework for the analysis of multivariate infectious disease surveillance counts. Statistical Modelling, 5, 187–199.

Paul, M., Held, L. and Toschke, A. M. (2008): Multivariate modelling of infectious disease surveillance data. Statistics in Medicine, 27, 6250–6267.

Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30, 1118–1136

Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal, 54, 824–843

Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639.
DOI-Link: http://dx.doi.org/10.1214/14-AOAS743

See Also

algo.hhh, fe, ri

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
#####################################################################
# Fit some models from ?algo.hhh
#####################################################################

## univariate salmonella agona data
data(salmonella.agona)
# convert to sts class
salmonella <- disProg2sts(salmonella.agona)

# generate formula for temporal and seasonal trends
f.end <- addSeason2formula(f = ~ 1 + t, S=1, period=52)
model1 <- list(ar = list(f = ~ 1), end = list(f =f.end),
               family = "NegBin1")
# run model
res <- hhh4(salmonella, model1)
summary(res, idx2Exp=1, amplitudeShift=TRUE)


## multivariate time series: 
# measles cases in Lower Saxony, Germany
data(measles.weser)
measles <- disProg2sts(measles.weser)

# same model as above
summary(hhh4(measles, control=model1))

# now use region-specific intercepts in endemic component
f.end2 <- addSeason2formula(f = ~ -1 + fe(1, unitSpecific = TRUE) + t,
                            S = 1, period = 52)
model2 <- list(ar = list(f = ~ 1), 
               end = list(f = f.end2, offset = population(measles)),
               family = "NegBin1")
# run model
summary(hhh4(measles, control=model2), idx2Exp=1, amplitudeShift=TRUE)

# include autoregressive parameter phi for adjacent "Kreise"
# no linear trend in endemic component
f.end3 <- addSeason2formula(f = ~ -1 + fe(1, unitSpecific = TRUE), 
                            S = 1, period = 52)
model3 <- list(ar = list(f = ~ 1),
               ne = list(f = ~ 1),
               end = list(f = f.end3, offset = population(measles)),
               family = "NegBin1")
# run model
res3 <- hhh4(measles, control=model3)
summary(res3, idx2Exp=1:2, amplitudeShift=TRUE)



## Not run: 
######################################################################
# Fit the models from the Paul & Held (2011) paper for the influenza data
# from Bavaria and Baden-Wuerttemberg (this takes some time!)
# For further documentation see also the vignette.
######################################################################

data("fluBYBW") 

###############################################################
## generate formula for temporal and seasonal trends
f.end <- addSeason2formula(f = ~ -1 + ri(type="iid", corr="all") + 
                               I((t-208)/100), S=3, period=52)

## details for optimizer
opt <- list(stop = list(tol=1e-5, niter=200),
            regression = list(method="nlminb"),
            variance = list(method="nlminb"))

##########################
## models 
# A0
cntrl_A0 <- list(ar = list(f = ~ -1),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose = 1)
summary(res_A0 <- hhh4(fluBYBW,cntrl_A0))

# B0
cntrl_B0 <- list(ar = list(f = ~ 1),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1)
res_B0 <- hhh4(fluBYBW,cntrl_B0)               
 

# C0
cntrl_C0 <- list(ar = list(f = ~ -1 + ri(type="iid", corr="all")),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1)
res_C0 <- hhh4(fluBYBW,cntrl_C0)               


#A1

# weight matrix w_ji = 1/(No. neighbors of j) if j ~ i, and 0 otherwise
wji <- neighbourhood(fluBYBW)/rowSums(neighbourhood(fluBYBW))

cntrl_A1 <- list(ar = list(f = ~ -1),
                 ne = list(f = ~ 1, weights = wji),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1)
res_A1 <- hhh4(fluBYBW,cntrl_A1)               


# B1
cntrl_B1 <- list(ar = list(f = ~ 1),
                 ne = list(f = ~ 1, weights = wji),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1)
res_B1 <- hhh4(fluBYBW,cntrl_B1)               


# C1
cntrl_C1 <- list(ar = list(f = ~ -1 + ri(type="iid", corr="all")),
                 ne = list(f = ~ 1, weights = wji),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1)
res_C1 <- hhh4(fluBYBW,cntrl_C1)               


#A2
cntrl_A2 <- list(ar = list(f = ~ -1),
                 ne = list(f = ~ -1 + ri(type="iid",corr="all"), weights=wji),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1)
res_A2 <- hhh4(fluBYBW,cntrl_A2)               


# B2
cntrl_B2 <- list(ar = list(f = ~ 1),
                 ne = list(f = ~ -1 + ri(type="iid",corr="all"), weights =wji),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1)
res_B2 <- hhh4(fluBYBW,cntrl_B2)               

# C2
cntrl_C2 <- list(ar = list(f = ~ -1 + ri(type="iid", corr="all")),
                 ne = list(f = ~ -1 + ri(type="iid",corr="all"), weights =wji),
                 end = list(f =f.end, offset = population(fluBYBW)),
                 family = "NegBin1", optimizer = opt, verbose=1,
                 start=list(fixed=fixef(res_B0),random=c(rep(0,140),
                         ranef(res_B0)), sd.corr=c(-.5,res_B0$Sigma.orig,0)))
res_C2 <- hhh4(fluBYBW,cntrl_C2)               


# D
cntrl_D <- list(ar = list(f = ~ 1),
                ne = list(f = ~ -1 + ri(type="iid"), weights = wji),
                end = list(f =addSeason2formula(f = ~ -1 + ri(type="car") + 
                                             I((t-208)/100), S=3, period=52), 
                          offset = population(fluBYBW)),
                family = "NegBin1", optimizer = opt, verbose=1)
res_D <- hhh4(fluBYBW,cntrl_D)               


## End(Not run)

jimhester/surveillance documentation built on May 19, 2019, 10:33 a.m.