pclm: Univariate Penalized Composite Link Model (PCLM)
In mpascariu/pclm: Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data

View source: R/pclm_1D.R

pclm	R Documentation

Univariate Penalized Composite Link Model (PCLM)

Description

Fit univariate penalized composite link model (PCLM) to ungroup binned count data, e.g. age-at-death distributions grouped in age classes.

Usage

pclm(
  x,
  y,
  nlast,
  offset = NULL,
  out.step = 1,
  ci.level = 95,
  verbose = FALSE,
  control = list()
)

Arguments

`x`	Vector containing the starting values of the input intervals/bins. For example: if we have 3 bins `[0,5), [5,10) and [10, 15)`, `x` will be defined by the vector: `c(0, 5, 10)`.
`y`	Vector with counts to be ungrouped. It must have the same dimension as `x`.
`nlast`	Length of the last interval. In the example above `nlast` would be 5.
`offset`	Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See \insertCiterizzi2015;textualungroup for further details.
`out.step`	Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.
`ci.level`	Level of significance for computing confidence intervals. Default: `95`.
`verbose`	Logical value. Indicates whether a progress bar should be shown or not. Default: `FALSE`.
`control`	List with additional parameters: `lambda` – Smoothing parameter to be used in pclm estimation. If `lambda = NA` an algorithm will find the optimal values. kr – Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See `MortSmooth_bbase`. `deg` – Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. `int.lambda` – If `lambda` is optimized an interval to be searched needs to be specified. Format: vector containing the end-points. `diff` – An integer indicating the order of differences of the components of PCLM coefficients. `opt.method` – Selection criterion of the model. Possible values are `"AIC"` and `"BIC"`. `max.iter` – Maximal number of iterations used in fitting procedure. `tol` – Relative tolerance in PCLM fitting procedure.

Details

The PCLM method is based on the composite link model, which extends standard generalized linear models. It implements the idea that the observed counts, interpreted as realizations from Poisson distributions, are indirect observations of a finer (ungrouped) but latent sequence. This latent sequence represents the distribution of expected means on a fine resolution and has to be estimated from the aggregated data. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike's Information Criterion.

Value

The output is a list with the following components:

`input`	A list with arguments provided in input. Saved for convenience.
`fitted`	The fitted values of the PCLM model.
`ci`	Confidence intervals around fitted values.
`goodness.of.fit`	A list containing goodness of fit measures: standard errors, AIC and BIC.
`smoothPar`	Estimated smoothing parameters: `lambda, kr` and `deg`.
`bins.definition`	Additional values to identify the bins limits and location in input and output objects.
`deep`	A list of objects created in the fitting process. Useful in diagnosis of possible issues.
`call`	An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments.

References

\insertAllCited

Examples

# Data  
x <- c(0, 1, seq(5, 85, by = 5))
y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998, 
       1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)
offset <- c(114, 440, 509, 492, 628, 618, 576, 580, 634, 657, 
            631, 584, 573, 619, 530, 384, 303, 245, 249) * 1000
nlast <- 26 # the size of the last interval

# Example 1 ----------------------
M1 <- pclm(x, y, nlast)
ls(M1)
summary(M1)
fitted(M1)
plot(M1)

# Example 2 ----------------------
# ungroup even in smaller intervals
M2 <- pclm(x, y, nlast, out.step = 0.5)
head(fitted(M1))
plot(M1, type = "s")
# Note, in example 1 we are estimating intervals of length 1. In example 2 
# we are estimating intervals of length 0.5 using the same aggregate data.

# Example 3 ----------------------
# Do not optimise smoothing parameters; choose your own. Faster.
M3 <- pclm(x, y, nlast, out.step = 0.5, 
           control = list(lambda = 100, kr = 10, deg = 10))
plot(M3)

summary(M2)
summary(M3) # not the smallest BIC here, but sometimes is not important.

# Example 4 -----------------------
# Grouped x & grouped offset (estimate death rates)
M4 <- pclm(x, y, nlast, offset)
plot(M4, type = "s")

# Example 5 -----------------------
# Grouped x & ungrouped offset (estimate death rates)

ungroupped_Ex <- pclm(x, y = offset, nlast, offset = NULL)$fitted # ungroupped offset data

M5 <- pclm(x, y, nlast, offset = ungroupped_Ex)

mpascariu/pclm documentation built on Feb. 4, 2024, 9:34 p.m.