cpglm | R Documentation |

This function fits compound Poisson generalized linear models.

```
cpglm(formula, link = "log", data, weights, offset,
subset, na.action = NULL, contrasts = NULL,
control = list(), chunksize = 0,
optimizer = "nlminb", ...)
```

`formula` |
an object of class |

`link` |
a specification for the model link function. This can be either a literal character string or a numeric number. If it is a character string, it must be one of "log", "identity", "sqrt" or "inverse". If it is numeric, it is the same as the |

`data` |
an optional data frame, list or environment (or object coercible by |

`weights` |
an optional vector of weights. Should be either |

`subset` |
an optional vector specifying a subset of observations to be used in the fitting process. |

`na.action` |
a function which indicates what should happen when the data contain |

`offset` |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be either |

`contrasts` |
an optional list. See |

`control` |
a list of parameters for controling the fitting process. See 'Details' below. |

`chunksize` |
an integer that indicates the size of chunks for processing the data frame as used in |

`optimizer` |
a character string that determines which optimization routine is to be used in estimating the index and the dispersion parameters. Possible choices are |

`...` |
additional arguments to be passed to |

This function implements the profile likelihood approach in Tweedie compound Poisson generalized linear models. First, the index and the dispersion parameters are estimated by maximizing (numerically) the profile likelihood (profile out the mean parameters as they are determined for a given value of the index parameter). Then the mean parameters are estimated using a GLM with the above-estimated index parameter. To compute the profile likelihood, one must resort to numerical methods provided in the `tweedie`

package for approximating the density of the compound Poisson distribution. Indeed, the function `tweedie.profile`

in that package makes available the profile likelihood approach. The `cpglm`

function differs from `tweedie.profile`

in two aspects. First, the user does not need to specify the grid of possible values the index parameter can take. Rather, the optimization of the profile likelihood is automated. Second, big data sets can be handled where the `bigglm`

function from the `biglm`

package is used in fitting GLMs. The `bigglm`

is invoked when the argument `chunksize`

is greater than 0. It is also to be noted that only MLE estimate for the dispersion parameter is included here, while `tweedie.profile`

provides several other possibilities.

The package used to implement a second approach using the Monte Carlo EM algorithm, but it is now removed because it does not offer obvious advantages over the profile likelihood approach for this model.

The `control`

argument is a list that can supply various controlling elements used in the optimization process, and it has the following components:

`bound.p`

a vector of lower and upper bounds for the index parameter

`p`

used in the optimization. The default is`c(1.01, 1.99)`

.`trace`

if greater than 0, tracing information on the progress of the fitting is produced. For

`optimizer = "nlminb"`

or`optimizer = "L-BFGS-B"`

, this is the same as the`trace`

control parameter, and for`optimizer = "bobyqa"`

, this is the same as the`iprint`

control parameter. See the corresponding documentation for details.`max.iter`

maximum number of iterations allowed in the optimization. The default is

`300`

.`max.fun`

maximum number of function evaluations allowed in the optimizer. The default is

`2000`

.

`cpglm`

returns an object of class `"cpglm"`

. See `cpglm-class`

for details of the return values as well as various methods available for this class.

Yanwei (Wayne) Zhang actuary_zhang@hotmail.com

Dunn, P.K. and Smyth, G.K. (2005). Series evaluation of Tweedie exponential dispersion models densities. *Statistics and Computing*, 15, 267-280.

The users are recommended to see the documentation for `cpglm-class`

, `glm`

, `tweedie`

, and `tweedie.profile`

for related information.

```
fit1 <- cpglm(RLD ~ factor(Zone) * factor(Stock),
data = FineRoot)
# residual and qq plot
parold <- par(mfrow = c(2, 2), mar = c(5, 5, 2, 1))
# 1. regular plot
r1 <- resid(fit1) / sqrt(fit1$phi)
plot(r1 ~ fitted(fit1), cex = 0.5)
qqnorm(r1, cex = 0.5)
# 2. quantile residual plot to avoid overlapping
u <- tweedie::ptweedie(fit1$y, fit1$p, fitted(fit1), fit1$phi)
u[fit1$y == 0] <- runif(sum(fit1$y == 0), 0, u[fit1$y == 0])
r2 <- qnorm(u)
plot(r2 ~ fitted(fit1), cex = 0.5)
qqnorm(r2, cex = 0.5)
par(parold)
# use bigglm
fit2 <- cpglm(RLD ~ factor(Zone),
data = FineRoot, chunksize = 250)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.