# tsglm: Count Time Series Following Generalised Linear Models In tscount: Analysis of Count Time Series

## Description

The function `tsglm` fits a generalised linear model (GLM) for time series of counts. The specification of the linear predictor allows for regressing on past observations, past values of the linear predictor and covariates as defined in the Details section. There is the so-called INGARCH model with the identity link (see for example Ferland et al., 2006, Fokianos et al., 2009) and another model with the logarithmic link (see for example Fokianos and Tjostheim, 2011), which also differ in the specification of the linear predictor. The conditional distribution can be chosen to be either Poisson or negative binomial.

Estimation is done by conditional maximum likelihood for the Poisson distribution or by a conditional quasi-likelihood approach based on the Poisson likelihood function for the negative binomial distribution.

There is a vignette available which introduces the functionality of `tsglm` and related functions of this package and its underlying statistical methods (`vignette("tsglm", package="tscount")`).

The function `tsglm.meanfit` is a lower level function to fit the mean specification of such a model assuming a Poisson distribution. It is called by `tsglm`. It has additional arguments allowing for a finer control of the fitting procedure, which can be handed over from the function `tsglm` by its `...` argument. Note that it is usually not necessary for a user to call this lower level functions nor to worry about the additional arguments provided by this function. The defaults of these arguments have been chosen wisely by the authors of this package and should perform well in most applications.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10``` ```tsglm(ts, model = list(past_obs = NULL, past_mean = NULL, external = NULL), xreg = NULL, link = c("identity", "log"), distr = c("poisson", "nbinom"), ...) tsglm.meanfit(ts, model, xreg, link, score = TRUE, info = c("score", "none", "hessian", "sandwich"), init.method=c("marginal", "iid", "firstobs", "zero"), init.drop = FALSE, epsilon = 1e-06, slackvar = 1e-06, start.control = list(), final.control = list(), inter.control = NULL) ```

## Arguments

 `ts` a univariate time series. `model` a named list specifying the model for the linear predictor, which can be of the following elements: `past_obs`integer vector giving the previous observations to be regressed on (autoregression). This is a vector with the elements i[1],…,i[p] (see Details). If omitted, or of length zero, there will be no regression on previous observations. `past_mean`integer vector giving the previous conditional means to be regressed on. This is a vector with the elements j[1],…,j[q] (see Details). If omitted, or of length zero, there will be no regression on previous conditional means. `external`logical vector of length `ncol(xreg)` specifying for each covariate wether its effect should be external or not (see Details). If this is a scalar this choice will be used for all covariates. If omitted, all covariates will have an internal effect (i.e. `external=FALSE`). `xreg` matrix with covariates in the columns, i.e. its number of rows must be `length(ts)`. This is the matrix X (see Details). If omitted no covariates will be included. For the identity link the covariates have to be non-negative. `link` character giving the link function. Default is `"identity"`, fitting an INGARCH model. Another possible choice is `"log"`, fitting a log-linear model. `distr` character giving the conditional distribution. Default is `"poisson"`, i.e. a Poisson distribution. `...` additional arguments to be passed to the lower level fitting function `tsglm.meanfit`. See below. `score` logical value indicating whether the score vector should be computed. `info` character that determines if and how to compute the information matrix. Can be set to `"score"` (the default) for calculation via the outer product of the score vector, or to `"hessian"` for calculation via the Hessian matrix of second derivatives. For `info="sandwich"` the information matrix is estimated by a sandwich formula using both the outer score product and the Hessian matrix. If set to `"none"`, no information matrix is computed. For `distr="nbinom"` one can only use `info="score"`. `init.method` character that determines how the recursion of the conditional mean (and possibly of its derivatives) is initialised. If set to `"marginal"` (the default), the marginal mean of a model without covariates and its derivatives are used. If set to `"iid"`, all values are initialised by the marginal mean under the assumption of i.i.d. data, which depends on the intercept only. If set to `"firstobs"` the first obersvation is used. If set to `"zero"`, the recursions are initialised by the value zero. `init.drop` logical value that determines which observations are considered for computation of the log-likelihood, the score vector and, if applicable, the information matrix. If `TRUE`, the first `max(model\$past_obs)` observations, which are needed for the autoregression, are not considered. If `FALSE` (the default), all observations are considered and pre-sample values determined by the method specified by the argument `itit.method` are used for the autoregression. Note that in the first case the effective number of observations used for maximum likelihood estimation is lower than the total number of observations of the original time series. Consequently only this lower number of observations is considered in the output. Note that for `init.drop=TRUE` the log-likelihood function for models of different orders might not be comparable if the effective number of observations is different. `epsilon` numeric positive but small value determining how close the parameters may come to the limits of the parameter space. `slackvar` numeric positive but small value determining how true inequalities among the parameter restrictions are treated; a true inequality `x < y` will be transformed to `x + slackvar <= y`. `start.control` named list with optional elements that determine how to make the start estimation. Possible list elements are: `use`integer vector of length one or two giving the number of observations from the beginning (if of length one) or the range of observations (if of length two) used for start estimation. For `use = Inf` all observations are used, which is the default. `method`character specifying how start estimators should be estimated. Possible values are `"iid"`, `"CSS"`, `"CSS-ML"`, `"ML"`, `"MM"`, `"GLM"` and `"fixed"`. If `method` is `"iid"` (the default), a moment estimator assuming an iid model without covariates is used. If `method="MM"`, the start estimate is an ARMA(1,1) fit by moment estimators and parameters of higher order than one are set to zero. For this method the starting parameter values for the covariates are zero by default and can be set by the list element `xreg`. If `method` is `"CSS"`, `"CSS-ML"` or `"ML"`, the start estimate is based on an ARMA fit using the function `arima`, and list element `method` is passed to its argument of the same name. If `method="GLM"`, the estimated parameters of a generalised linear model with regression on the specified past observations and covariates, but not on past conditional means, are used as start estimates. Initial estimates for the coefficients of past conditional means are set to zero. If `method="fixed"`, parameters given in further named list elements of `start.control` are used when available, else the predefined values given in the following are used. `intercept`numeric value with the start value for the intercept parameter. Default value is 1. `past_obs`numeric vector with the start values for parameters for regression on previous observations. Default values are zero. `past_mean`numeric vector with the start values for parameters for regression on previous conditional means. Default values are zero. `xreg`numeric vector with the start values for the regression parameters. These values will also be used if `method="MM"`. Default values are zero. `final.control` named list with optional elements that determine how to make the final maximum likelihood estimation. If `final.control=NULL`, only start estimates are computed and a list with fewer elements which has not the class `"tsglm"` is returned. Possible list elements of this argument are: `constrained`named list whose elements are passed to function `constrOptim` with possible elements `mu`, `outer.iterations` and `outer.eps` (see `constrOptim` for details). If `constrained=NULL`, an unconstrained optimisation is made with function `optim`. Note that this is likely to result in a fitted model which is non-stationary, which might cause further problems. `optim.method`character which is passed to functions `constrOptim` or `optim` as argument `method`. The default is `"BFGS"`. `optim.control`named list which is passed to function `constrOptim` or `optim` as the argument `control`. Must not contain the list element `fnscale`. The default is `list(maxit=20, reltol=1e-8)`. `inter.control` named list determining how to maximise the log-likelihood function in a first step. This intermediate optimisation will start from the start estimation and be followed by the final optimisation, which will in turn start from the intermediate optimisation result. This intermediate optimisation is intended to use a very quick but imprecise optimisation algorithm. Possible elements are the same as for `final.control`. The default is `inter.control=NULL`, which skips this intermediate optimisation step.

## Details

The INGARCH model (argument `link="identity"`) used here follows the definition

Z[t]|F[t-1] ~ Poi(ν[t]) or Z[t]|F[t-1] ~ NegBin(ν[t], φ),

where F[t-1] denotes the history of the process up to time t-1, Poi and NegBin is the Poisson respectively the negative binomial distribution with the parametrisation as specified below. For the model with covariates having an internal effect (the default) the linear predictor of the INGARCH model (which is in that case identical to the conditional mean) is given by

ν[t] = β[0] + β[1] Z[t-i[1]] + … + β[p] Z[t-i[p]] + α[1] ν[t-j[1]] + … + α[q] ν[t-j[q]] + η[1] X[t,1] + … + η[r] X[t,r].

The log-linear model (argument `link="log"`) used here follows the definition

Z[t]|F[t-1] ~ Poi(λ[t]) or Z[t]|F[t-1] ~ NegBin(λ[t], φ),

with λ[t] = \exp(ν[t]) and F[t-1] as above. For the model with covariates having an internal effect (the default) the linear predictor ν[t] = \log(λ[t]) of the log-linear model is given by

ν[t] = β[0] + β[1] \log(Z[t-i[1]]+1) + … + β[p] \log(Z[t-i[p]]+1) + α[1] ν[t-j[1]] + … + α[q] ν[t-j[q]] + η[1] X[t,1] + … + η[r] X[t,r].

Note that because of the logarithmic link function the effect of single summands in the linear predictor on the conditional mean is multiplicative and hence the parameters play a different role than in the INGARCH model, although they are denoted by the same letters.

The Poisson distribution is parametrised by the mean `lambda` according to the definition in `Poisson`. The negative binomial distribution is parametrised by the mean `mu` with an additional dispersion parameter `size` according to the definition in `NegBinomial`. In the notation above its mean parameter `mu` is ν[t] and its dispersion parameter `size` is φ.

This function allows to include covariates in two different ways. A covariate can have a so-called internal effect as defined above, where its effect propagates via the regression on past values of the linear predictor and on past observations. Alternatively, it can have a so-called external effect, where its effect does not directly propagates via the feedback on past values of the linear predictor, but only via past observations. For external effects of the covariates, the linear predictor for the model with identity link is given by

ν[t] = μ[t] + η[1] X[t,1] + … + η[r] X[t,r],

μ[t] = β[0] + β[1] Z[t-i[1]] + … + β[p] Z[t-i[p] + α[1] μ[t-j[1]] + … + α[q] μ[t-j[q]],

and analoguesly for the model with logarithmic link by

ν[t] = μ[t] + η[1] X[t,1] + … + η[r] X[t,r],

μ[t] = β[0] + β[1] \log(Z[t-i[1]]+1) + … + β[p] \log(Z[t-i[p]+1) + α[1] μ[t-j[1]] + … + α[q] μ[t-j[q]].

This is described in more detail by Liboschik et al. (2014) for the case of deterministic covariates for modelling interventions. It is also possible to model a combination of external and internal covariates, which can be defined straightforwardly by adding each covariate either to the linear predictor ν[t] itself (for an internal effect) or to μ[t] defined above (for an external effect).

## Value

An object of class `"tsglm"`, which is a list with at least the following elements:

 `coefficients` a named vector of the maximum likelihood estimated coefficients, which can be extracted by the `coef` method. `start` a named vector of the start estimation for the coefficients. `residuals` a vector of residuals, which can be extracted by the `residuals` method. `fitted.values` the fitted values, which can be extracted by the `fitted` method. `linear.predictors` the linear fit on link scale. `response` a vector of the response values (this is usually the original time series but possibly without the first few observations used for initialization if argument `init.drop=TRUE`). `logLik` the log-likelihood of the fitted model, which can be extracted by the `logLik` method. This is the complete log-likelihood including all constant terms. It is based on `n_eff` observations (see below). `score` the score vector at the maximum likelihood estimation. `info.matrix` the information matrix at the maximum likelihood estimation assuming a Poisson distribution. `info.matrix_corrected` the information matrix at the maximum likelihood estimation assuming the distribution specified in `distr`. `call` the matched call. `n_obs` the number of observations. `n_eff` the effective number of observations used for maximum likelihood estimation (might be lower than `n_obs` if argument `init.drop=TRUE`). `ts` the original time series. `model` the model specification. `xreg` the given covariates. `distr` a character giving the fitted conditional distribution. `distrcoefs` a named vector of the estimated additional coefficients specifying the conditional distribution. Is `NULL` in case of a Poisson distribution. `sigmasq` the estimated overdispersion coefficient. Is zero in case of a Poisson distribution.

The function `tsglm.meanfit` has the same output except the elements `distr`, `distrcoefs` and `sigmasq`. In addition, they return the following list elements:

 `inter` some details on the intermediate estimation of the coefficients as returned by `constrOptim` or `optim`. `final` some details on the final estimation of the coefficients as returned by `constrOptim` or `optim`. `durations` named vector of the durations of the model fit (in seconds). `outerscoreprod` array of outer products of score vectors at each time point.

## Author(s)

Tobias Liboschik, Philipp Probst, Konstantinos Fokianos and Roland Fried

## References

Christou, V. and Fokianos, K. (2014) Quasi-likelihood inference for negative binomial time series models. Journal of Time Series Analysis 35(1), 55–78, http://dx.doi.org/10.1002/jtsa.12050.

Christou, V. and Fokianos, K. (2015) Estimation and testing linearity for non-linear mixed poisson autoregressions. Electronic Journal of Statistics 9, 1357–1377, http://dx.doi.org/10.1214/15-EJS1044.

Ferland, R., Latour, A. and Oraichi, D. (2006) Integer-valued GARCH process. Journal of Time Series Analysis 27(6), 923–942, http://dx.doi.org/10.1111/j.1467-9892.2006.00496.x.

Fokianos, K. and Fried, R. (2010) Interventions in INGARCH processes. Journal of Time Series Analysis 31(3), 210–225, http://dx.doi.org/10.1111/j.1467-9892.2010.00657.x.

Fokianos, K., and Fried, R. (2012) Interventions in log-linear Poisson autoregression. Statistical Modelling 12(4), 299–322. http://dx.doi.org/10.1177/1471082X1201200401.

Fokianos, K., Rahbek, A. and Tjostheim, D. (2009) Poisson autoregression. Journal of the American Statistical Association 104(488), 1430–1439, http://dx.doi.org/10.1198/jasa.2009.tm08270.

Fokianos, K. and Tjostheim, D. (2011) Log-linear Poisson autoregression. Journal of Multivariate Analysis 102(3), 563–578, http://dx.doi.org/10.1016/j.jmva.2010.11.002.

Liboschik, T., Fokianos, K. and Fried, R. (2017) tscount: An R package for analysis of count time series following generalized linear models. Journal of Statistical Software 82(5), 1–51, http://dx.doi.org/10.18637/jss.v082.i05.

S3 methods `print`, `summary`, `residuals`, `plot`, `fitted`, `coef`, `predict`, `logLik`, `vcov`, `AIC`, `BIC` and `QIC` for the class `"tsglm"`. The S3 method `se` computes the standard errors of the parameter estimates. Additionally, there are the S3 methods `pit`, `marcal` and `scoring` for predictive model assessment.

S3 methods `interv_test`, `interv_detect` and `interv_multiple` for tests and detection procedures for intervention effects. `tsglm.sim` for simulation from GLM-type model for time series of counts. `ingarch.mean`, `ingarch.var` and `ingarch.acf` for calculation of analytical mean, variance and autocorrelation function of an INGARCH model (i.e. with identity link) without covariates.

Example time series of counts are `campy`, `ecoli`, `ehec`, `influenza`, `measles` in this package, `polio` in package `gamlss.data`.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```###Campylobacter infections in Canada (see help("campy")) interventions <- interv_covariate(n=length(campy), tau=c(84, 100), delta=c(1, 0)) #detected by Fokianos and Fried (2010, 2012) #Linear link function with Negative Binomial distribution: campyfit <- tsglm(campy, model=list(past_obs=1, past_mean=13), xreg=interventions, distr="nbinom") campyfit plot(campyfit) ###Road casualties in Great Britain (see help("Seatbelts")) timeseries <- Seatbelts[, "VanKilled"] regressors <- cbind(PetrolPrice=Seatbelts[, c("PetrolPrice")], linearTrend=seq(along=timeseries)/12) #Logarithmic link function with Poisson distribution: seatbeltsfit <- tsglm(ts=timeseries, link="log", model=list(past_obs=c(1, 12)), xreg=regressors, distr="poisson") summary(seatbeltsfit) ```

### Example output

```Call:
tsglm(ts = campy, model = list(past_obs = 1, past_mean = 13),
xreg = interventions, distr = "nbinom")

Coefficients:
(Intercept)       beta_1     alpha_13     interv_1     interv_2
3.3184       0.3690       0.2198       3.0810      41.9541

Overdispersion coefficient 'sigmasq' was estimated to be 0.02974996.

Call:
tsglm(ts = timeseries, model = list(past_obs = c(1, 12)), xreg = regressors,
link = "log", distr = "poisson")

Coefficients:
Estimate  Std.Error  CI(lower)  CI(upper)
(Intercept)    1.7951     0.3479     1.1133     2.4770
beta_1         0.0974     0.0743    -0.0482     0.2430
beta_12        0.1685     0.0772     0.0172     0.3198
PetrolPrice    0.9030     2.2874    -3.5801     5.3861
linearTrend   -0.0394     0.0072    -0.0535    -0.0253
Standard errors and confidence intervals (level =  95 %) obtained
by normal approximation.