pclm.default: Fitting Penalized Composite Linear Model (PCLM)

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

The PCLM method is based on the composite link model, with a penalty added to ensure the smoothness of the target distribution. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike’ s Information Criterion [From Rizzi et al. 2015 abstract].

Usage

1
2
pclm.default(x, y, count.type = c("DX", "LX"), out.step = "auto",
  exposures = NULL, control = list())

Arguments

x

Vector with start of the interval for age/time classes.

y

Vector with counts, e.g. ndx. It must have the same length as x.

count.type

Type of the data, deaths("DX")(default) or exposures("LX".)

out.step

Age interval length in output aggregated life-table. If set to "auto" then the parameter is automatically set to the length of the shortest age/time interval of x.

exposures

Optional exposures to calculate smooth mortality rates. A vector of the same length as x and y. See reference [1] for further details.

control

List with additional parameters. See pclm.control.

Details

The function has four major steps:

  1. Calculate interval multiple (pclm.interval.multiple to remove fractional parts from x vector. The removal of fractional parts is necessary to build composition matrix.

  2. Calculate composition matrix using pclm.compmat.

  3. Fit PCLM model using pclm.opt.

  4. Calculate aggregated (grouped) life-table using pclm.aggregate.

More details for PCLM algorithm can be found in reference [1], but see also pclm.compmat.

Value

The output is of "pclm" class with the components:

grouped

Life-table based on aggregated PCLM fit and defined by out.step.

raw

Life-table based on original (raw) PCLM fit.

fit

PCLM fit used to construct life-tables.

m

Interval multiple, see pclm.interval.multiple, pclm.compmat.

x.div

Value of x.div, see pclm.control.

out.step

Interval length of aggregated life-table, see pclm.control.

control

Used control parameters, see pclm.control.

warn.list

List with warnings.

Author(s)

Maciej J. Danko <danko@demogr.mpg.de> <maciej.danko@gmail.com>

References

  1. Rizzi S, Gampe J, Eilers PHC. Efficient estimation of smooth distributions from coarsely grouped data. Am J Epidemiol. 2015;182:138?47.

  2. Rizzi S, Thinggaard M, Engholm G, et al. Comparison of non-parametric methods for ungrouping coarsely aggregated data. BMC Medical Research Methodology. 2016;16:59. doi:10.1186/s12874-016-0157-8.

See Also

pclm.compmat, pclm.interval.multiple, and pclm.nclasses.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
## Not run: 
# Use a simple data set. Naive life-table.
# Age: 
x <- c(0, 1, 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
# Death counts:
dx <- c(38, 37, 17, 104, 181, 209, 452, 1190, 2436, 3164, 1852, 307, 13)
# Survivors at the beginning of age class
lx <- sum(dx)-c(0, cumsum(dx[-length(dx)]))
# Interval length
n <- diff(c(x,110))
# Approximation of mortality per age class
mx <- - log(1 - dx / lx) / n
# Mid-interval vector
xh <- x + n / 2
# Approximated exposures
Lx <- n * (lx - dx) + 0.5 * dx *n

# *** Use PCLM
# Ungroup dataset with out.step equal minimal interval length
min(diff(x))
AU10p.1a <- pclm.default(x, dx)
print(AU10p.1a)
plot(AU10p.1a)

# Ungroup AU10 with out.step equal minimal interval length
# and get good estimates of nax
AU10p.1b <- pclm.default(x, dx, control = list(x.div = 10))
print(AU10p.1b)
plot(AU10p.1b)
# This time number of internal (raw) PCLM classes was high
# and automatically P-splines were used to prevent long computations

# This number can be estimated before performing
# PCLM calclualtions:
pclm.nclasses(x, control = list(x.div = 10))
# which is the same as in the fitted model
length(AU10p.1b$raw$x)
# whereas number of classes in the aggregated life-table
# depends on out.step
length(AU10p.1b$grouped$x) 

# To speed-up computations we can decrease the number of P-spline knots
AU10p.1c <- pclm.default(x, dx, control = list(x.div = 10,
                     bs.use = TRUE, bs.df.max = 100))

# *** Diagnostic plots for fitted PCLM model
# Aggregated PCLM fit:
plot(AU10p.1b, type = 'aggregated')
# Raw PCLM fit before aggregation:
plot(AU10p.1b, type = 'nonaggregated')

# In this PCLM fit aggregated life-table is identical
# with nonaggregated
plot(AU10p.1a, type = 'aggregated')
plot(AU10p.1a, type = 'nonaggregated')

# *** Combined summary of pash and pclm objects
summary(AU10p.1a)
summary(AU10p.1b)
summary(AU10p.1c)

# *** Smooth and aggregate data into 12-year interval
AU10p.2 <- pclm.default(x, dx, out.step = 12)
print(AU10p.2)
print(AU10p.2, type = 'aggregated') # grouped PCLM life-table
print(AU10p.2, type = 'nonaggregated') # raw PCLM life-table
plot(AU10p.2)

# *******************************************************************
# Usage of PCLM methods to fit and plot mortality data
# *******************************************************************

AU10p.4a <- pclm.default(x, dx, control = list(x.div = 5))
X <- AU10p.4a$grouped$x
M <- -log(1 - AU10p.4a$grouped$dx/AU10p.4a$grouped$lx)
plot(X, log10(M), type='l', lwd = 2,
     xlim=c(0,130), xlab='Age', ylab='log_10 mortality', col = 2)
lines(xh, log10(mx1), type = 'p')
tail(AU10p.4a, n = 10)
#note that lx has standardized values

# Improving the plot to cover more age classes
AU10p.4b <- pclm.default(x, dx, control = list(zero.class.end = 150,
                     x.div = 4))
X <- AU10p.4b$grouped$x
M <- -log(1 - AU10p.4b$grouped$dx / AU10p.4b$grouped$lx)
 
plot(X, log10(M), type='l', lwd = 2,
     xlim=c(0,130), xlab='Age', ylab='log_10 mortality', col = 2)
lines(xh, log10(mx1), type = 'p')
tail(AU10p.4a, n = 10)

# The change of the order of the difference in pclm algorithm may
# affect hte interpretation of the tail.
# Try to check pclm.deg = 4 and 5.
AU10p.4c <- pclm.default(x, dx, control = list(zero.class.end = 150,
                     x.div = 1, pclm.deg = 4))
X <- AU10p.4c$grouped$x
M <- -log(1 - AU10p.4c$grouped$dx / AU10p.4c$grouped$lx)
plot(X, log10(M), type='l', lwd = 2,
     xlim=c(0,130), xlab='Age', ylab='log_10 mortality', col = 2)
lines(xh, log10(mx1), type = 'p')

# Using exposures to fit mortality, 
# Notice that different approximation of mortality rate is used than in
# previous cases.
AU10p.4c <- pclm.default(x, dx, exposures = Lx, control = list(zero.class.end = 150,
                     x.div = 1, pclm.deg = 2, bs.use = FALSE))
X <- AU10p.4c$grouped$x
M <- AU10p.4c$grouped$mx
plot(X, log10(M), type='l', lwd = 2,
     xlim=c(0,130), xlab='Age', ylab='log_10 mortality', col = 2)
lines(xh, log10((dx / Lx) / n), type = 'p')

# *******************************************************************
# Usage of PCLM methods for more complicated dataset
# - understanding the out.step, x.div, and interval multiple
# *******************************************************************

# *** Generate a dataset with varying and fractional interval lengths
x <- c(0, 0.6, 1, 1.4, 3, 5.2, 6.4, 8.6, 11, 15,
       17.2, 19, 20.8, 23, 25, 30)
dx <- ceiling(10000*diff(pgamma(x, shape = 3.8, rate = .4)))
barplot(dx/diff(x), width = c(diff(x), 2)) # preview
lx <- 10000-c(0, cumsum(dx))
dx <- c(dx, lx[length(lx)])  

# *** Fit PCLM with automatic out.step
Bp1 <- pclm.default(x, dx)
# Output interval length (out.step) is automatically set to 0.4
# which is the minimal interval length in original data.
min(diff(x))
summary(Bp1) #new out.step can be also read from summary
plot(Bp1)

# *** Setting manually out.step
Bp2 <- pclm.default(x, dx, out.step = 1)
plot(Bp2, type = 'aggregated') # The fit with out.step = 1
plot(Bp2, type = 'nonaggregated') # It is clear that
# PCLM extended internal interval length even without changing x.div
# It was done because of the fractional parts in x vector.
# This is also a case for Bp1
summary(Bp2) #PCLM interval length = 0.2
Bp2$raw$n[1:10]

# *** Setting manually out.step to a smaller value than
#     the smallest original interval length
Bp3 <- pclm.default(x, dx, out.step = 0.1)
summary(Bp3)
# We got a warning as out.step cannot be smaller than
# smallest age class if x.div = 1

# We can change x.div to make it possible
Bp3 <- pclm.default(x, dx, out.step = 0.1, control = list(x.div = 2))
#0.1 is two times smaller than minimal interval length
summary(Bp3) # We were able to change the interval
plot(Bp3)
# NOTE: In this case x.div has not sufficient value to
#       get good axn estimates
Bp3$grouped$ax[1:10]

# This can be changed by the further increase of x.div
Bp4 <- pclm.default(x, dx, out.step = 0.1, control = list(x.div = 20))
Bp4$grouped$ax[1:10]
# NOTE: This time P-spline approximation was used because
# the composition matrix was huge

# Finally, we were able to get our assumed out.step
Bp4$grouped$n[1:10]

# In the fitted model the interval multiple (m) is 5.
(m <- pclm.interval.multiple(x, control = list(x.div = 20)))
summary(Bp4)
# Interval multiple determines
# the maximal interval length in raw PCLM life-table,
(K <- 1 / m)
# which is further divided by x.div.
K / 20
# Simply: 1 / (m * x.div) = 1 / (5 * 20) = 0.01
# The interval in the raw PCLM life-table is 10 times shorter than
# in the grouped life-table
# interval length in aggregated PCLM life-table:
Bp4$grouped$n[1:10]/ # divided by
# interval length in nonaggregated PCLM life-table:
Bp4$raw$n[1:10]
# NOTE: The interval for the raw PCLM life-table depends
# on original interval, m, and x.div,
# whereas the grouped PCLM interval length is set by out.step,
# which could be eventually increased if out.step < raw PCLM
# interval length.

# **** See more examples in the help for pclm.nclasses() function.

## End(Not run)

MaciejDanko/pclm documentation built on May 3, 2019, 3:36 p.m.