# pclm.fit: Penalized Composite Linear Model (PCLM) for PASH object. In MaciejDanko/pclmpash:

## Description

The PCLM method is based on the composite link model, with a penalty added to ensure the smoothness of the target distribution. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike<e2><80><99> s Information Criterion [From Rizzi et al. 2015 abstract].

## Usage

 ```1 2 3``` ```pclm.fit(object, population.size, out.step = "auto", to.pash = c("aggregated", "nonaggregated"), last_open = FALSE, nax.method = c("udd", "cfm"), control = list()) ```

## Arguments

 `object` A `pash` object. `population.size` Population size. If it is not given then it will be retrieved from the pash object (if possible). You may want to set it to a high value (e.g. 10000) if the data has no information about population number. `out.step` Age interval length in output aggregated life-table. If set to `"auto"` then the parameter is automatically set to the length of the shortest age/time interval of pash life-table. `to.pash` A way how the `pash` life-table is constructed: `"aggregated"` - The `pash` life-table is constructed from aggregated PCLM fit with interval length of `out.step`. `"nonaggregated"` - The `pash` life-table is constructed from nonaggregated (raw) PCLM fit with interval of length equal to the shortest original interval length divided by `x.div`. See `pclm.control`. `last_open` Logical determining if to construct `pash` life-table with last open interval. See `Inputlx`. `nax.method` A way of calculating nax in `Inputlx` if `to.pash == "nonaggregated"`. Possible values: `"udd"` (see `naxUDD`) or `"cfm"` (see `naxCFMfromnmx`). `control` List with additional parameters. See `pclm.control`.

## Details

The function read `pash` object and run `pclm.general` function. The new `pash` object is constructed from `pclm.general` output.

Use `pclm.general` for more flexible and direct PCLM fitting.

## Value

An object of classes `"pclm"` and `"pash"` with PCLM-based life-table and `\$pclm` component. The function updates a `pash` object by fitting PCLM.
The new object inherits `source` and `time_unit` attributes from the original `pash` object as well as class `"pash"`. The pash life-table (component `\$lt`) contains the life-table based on the fitted PCLM (aggregated or nonaggregated depending on `to.pash` parameter). The newly constructed `pash` object contains extra `\$pclm` component needed to run `summary` and `plot` functions.

List of `\$pclm` sub-components:

 `grouped` Life-table based on aggregated PCLM fit defined by `out.step`. `raw` Life-table based on original (raw) PCLMfit. `fit` PCLM fit used to construct life-tables. `warn.list` List with warnings. `m` Interval multiple, see `pclm.interval.multiple`, `pclm.compmat`. `x.div` Value of `x.div`, see `pclm.control`. `out.step` Interval length of aggregated life-table, see `pclm.control`.

## Author(s)

Maciej J. Danko <[email protected]> <[email protected]>

## References

1. Rizzi S, Gampe J, Eilers PHC. Efficient estimation of smooth distributions from coarsely grouped data. Am J Epidemiol. 2015;182:138?47.

2. Rizzi S, Thinggaard M, Engholm G, et al. Comparison of non-parametric methods for ungrouping coarsely aggregated data. BMC Medical Research Methodology. 2016;16:59. doi:10.1186/s12874-016-0157-8.

`pclm.control`, `plot.pclm`, `summary.pclm`, `pash`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215``` ```## Not run: # ******************************************************************* # Usage of PCLM methods for simple cases # ******************************************************************* # *** Create pash objects with different interval lengths AU1 <- Inputlx(x = australia_1y\$x, lx = australia_1y\$lx, nax = australia_1y\$nax, nx = australia_1y\$nx, last_open = TRUE) AU10 <- Inputlx(x = australia_10y\$x, lx = australia_10y\$lx, nax = australia_10y\$nax, nx = australia_10y\$nx, last_open = TRUE) # *** Use PCLM # Ungroup AU10 with out.step equal minimal interval length min(AU10\$lt\$nx[-13]) AU10p.1a <- pclm.fit(AU10) print(AU10p.1a) plot(AU10p.1a) # Ungroup AU10 with out.step equal minimal interval length # and get good estimates of nax AU10p.1b <- pclm.fit(AU10, control = list(x.div = 10)) print(AU10p.1b) plot(AU10p.1b) # This time number of internal (raw) PCLM classes was high # and automatically P-splines were used to prevent long computations # This number can be estimated before performing # PCLM calclualtions: pclm.nclasses(AU10\$lt\$x, control = list(x.div = 10)) # which is the same as in the fitted model length(AU10p.1b\$pclm\$raw\$x) # whereas number of classes in the pash life-table # depends on out.step length(AU10p.1b\$lt\$x) length(AU10p.1b\$pclm\$grouped\$x) # equivalently # To speed-up computations we can decrease the number of P-sline knots AU10p.1c <- pclm.fit(AU10, control = list(x.div = 10, bs.use = TRUE, bs.df.max = 100)) # We can also use raw (nonaggregated) PCLM life-table # as the default pash life-table: AU10p.1d <- pclm.fit(AU10, control = list(x.div = 10), to.pash = "nonaggregated") print(AU10p.1d) length(AU10p.1d\$lt\$x) # the number of raw PCLM classes = # = number of pash life-table classes # *** Pace measures sorted (ascending order) by # potential precision of computation GetPace(AU10) GetPace(AU10p.1a) GetPace(AU10p.1b) GetPace(AU10p.1c) GetPace(AU10p.1d) GetPace(AU1) # The same for shappe GetShape(AU10) GetShape(AU10p.1a) GetShape(AU10p.1b) GetShape(AU10p.1c) GetShape(AU10p.1d) GetShape(AU1) # *** Diagnostic plots for fitted PCLM model # Aggregated PCLM fit: plot(AU10p.1b, type = 'aggregated') # Raw PCLM fit before aggregation: plot(AU10p.1b, type = 'nonaggregated') # In this PCLM fit aggregated life-table is identical # with nonaggregated plot(AU10p.1a, type = 'aggregated') plot(AU10p.1a, type = 'nonaggregated') # *** Combined summary of pash and pclm objects summary(AU10p.1a) summary(AU10p.1b) summary(AU10p.1c) summary(AU10p.1d) # Summary if pclm object is not present summary(AU10) # *** Smooth and aggregate data into 12-year interval AU10p.2 <- pclm.fit(AU10, out.step = 12) print(AU10p.2) print(AU10p.2, type = 'aggregated') # grouped PCLM life-table print(AU10p.2, type = 'nonaggregated') # raw PCLM life-table plot(AU10p.2) # *** Effect of the smaller sample size on the estimate. # Forced change of population size. AU10p.3 <- pclm.fit(AU10, population.size = 20, out.step = 1, control = list(x.div = 1)) plot(AU10p.3) # *** Plotting mortality AU10p.4a <- pclm.fit(AU10, population.size=1e6, control = list(x.div = 5)) plot(AU10p.4a\$lt\$x, log10(AU10p.4a\$lt\$nmx), type='l', lwd = 2, xlim=c(0,130), xlab='Age', ylab='log_10 mortality', col = 2) lines(AU1\$lt\$x, log10(AU1\$lt\$nmx), type = 'p') tail(AU10p.4a, n = 10) #note that lx has standardized values # Improving the plot to cover more age classes AU10p.4b <- pclm.fit(AU10, control = list(zero.class.end = 150, x.div = 4)) plot(AU10p.4b\$lt\$x, log10(AU10p.4b\$lt\$nmx), type='l', lwd = 2, xlim=c(0,130), xlab='Age', ylab='log_10 mortality', col = 2) lines(AU1\$lt\$x, log10(AU1\$lt\$nmx), type = 'p') print(AU10p.4b\$lt[111:120,]) # The change of the order of the difference in pclm algorithm may # affect hte interpretation of the tail. # But try to check also pclm.deg = 4 and 5. AU10p.4c <- pclm.fit(AU10, control = list(zero.class.end = 150, x.div = 4, pclm.deg = 4)) plot(AU10p.4c\$lt\$x, log10(AU10p.4c\$lt\$nmx), type='l', lwd = 2, xlim=c(0,130), xlab='Age', ylab='log_10 mortality', col = 2) lines(AU1\$lt\$x, log10(AU1\$lt\$nmx), type = 'p') # ******************************************************************* # Usage of PCLM methods for more complicated dataset # - understanding the out.step, x.div, and interval multiple # ******************************************************************* # *** Generate a dataset with varying and fractional interval lengths x <- c(0, 0.6, 1, 1.4, 3, 5.2, 6.4, 8.6, 11, 15, 17.2, 19, 20.8, 23, 25, 30) ndx <- ceiling(10000*diff(pgamma(x, shape = 3.8, rate = .4))) barplot(ndx/diff(x), width = c(diff(x), 2)) # preview # *** Create pash object (B <- Inputlx(x = x, lx = 10000-c(0, cumsum(ndx)), last_open = TRUE)) # *** Fit PCLM with automatic out.step Bp1 <- pclm.fit(B) # Output interval length (out.step) is automatically set to 0.4 # which is the minimal interval length in original data. min(B\$lt\$nx, na.rm = T) summary(Bp1) #new out.step can be also read from summary plot(Bp1) # *** Setting manually out.step Bp2 <- pclm.fit(B, out.step = 1) plot(Bp2, type = 'aggregated') # The fit with out.step = 1 plot(Bp2, type = 'nonaggregated') # It is clear that # PCLM extended internal interval length even without changing x.div # It was done because of the fractional parts in x vector. # This is also a case for Bp1 summary(Bp2) #PCLM interval length = 0.2 Bp2\$pclm\$raw\$n[1:10] # *** Setting manually out.step to a smaller value than # the smallest original interval length Bp3 <- pclm.fit(B, out.step = 0.1) summary(Bp3) # We got a warning as out.step cannot be smaller than # smallest age class if x.div = 1 # We can change x.div to make it possible Bp3 <- pclm.fit(B, out.step = 0.1, control = list(x.div = 2)) #0.1 is two times smaller than minimal interval length summary(Bp3) # We were able to change the interval plot(Bp3) # NOTE: In this case x.div has not sufficient value to # get good axn estimates Bp3\$pclm\$grouped\$ax[1:10] Bp3\$lt\$nax[1:10] #equivalently # This can be changed by the further increase of x.div Bp4 <- pclm.fit(B, out.step = 0.1, control = list(x.div = 20)) Bp4\$pclm\$grouped\$ax[1:10] # NOTE: This time P-spline approximation was used because # the composition matrix was huge # Finally, we were able to get our assumed out.step Bp4\$pclm\$grouped\$n[1:10] Bp4\$lt\$nx[1:10] #equivalently In the fitted model the interval multiple (m) is 5. (m <- pclm.interval.multiple(B\$lt\$x, control = list(x.div = 20))) summary(Bp4) # Interval multiple determines # the maximal interval length in raw PCLM life-table, (K <- 1 / m) # which is further divided by x.div. K / 20 # Simply: 1 / (m * x.div) = 1 / (5 * 20) = 0.01 # The interval in the raw PCLM life-table is 10 times shorter than # in the grouped life-table #interval length in aggregated PCLM life-table: Bp4\$pclm\$grouped\$n[1:10]/ # divided by # interval length in nonaggregated PCLM life-table: Bp4\$pclm\$raw\$n[1:10] # REMEBER: The interval for the raw PCLM life-table depends # on original interval, m, and x.div, # whereas the grouped PCLM interval length is set by out.step, # which could be eventually increased if out.step < raw PCLM # interval length. # *** Setting nonaggregated PCLM life-table as pash life-table #2 Bp5 <- pclm.fit(B, out.step = 0.1, control = list(x.div = 20), to.pash = "nonaggregated", nax.method = "cfm") # NOTE: For the very small interval length the "cfm" method # may not give realistic nax values Bp5 Bp4 GetShape(Bp4) GetShape(Bp5) # **** See more examples in the help for pclm.nclasses() function. ## End(Not run) ```