Description Usage Arguments Value Note Author(s) References See Also Examples
apc.fit.model
fits the age period cohort model as a Generalized Linear Model using glm.fit
.
The model is parametrised in terms of the canonical parameter introduced by Kuang, Nielsen and Nielsen (2008),
see also the implementation in Martinez Miranda, Nielsen and Nielsen (2015).
This parametrisation has a number of advantages: it is freely varying, it is the canonical parameter of a
regular exponential family, and it is invariant to extentions of the data matrix.
Other parametrizations can be computed using apc.identify
.
apc.fit.model
can be be used for all three age period cohort factors, or for submodels with fewer of these factors.
apc.fit.model
can be used either for mortality rates through a dose-response model or for mortality counts through a pure response model without doses/exposures.
The GLM families include Poisson regressions (with log link) and Normal/Gaussian least squares regressions.
apc.fit.table produces a deviance table for 15 combinations of the three factors and linear trends: "APC", "AP", "AC", "PC", "Ad", "Pd", "Cd", "A", "P", "C", "t", "tA", "tP", "tC", "1".
1 2 3 4 | apc.fit.model(apc.data.list,model.family,model.design,apc.index=NULL,
replicate.version.1.3.1=FALSE)
apc.fit.table(apc.data.list,model.family,model.design.reference="APC",
apc.index=NULL)
|
apc.data.list |
List. See |
model.family |
Character. The following options are implemented. These are used internally when
calling
|
model.design |
Character. This indicates the design choice. The following options are possible.
|
model.design.reference |
Character. This indicates the reference design choice for the deviance table. Choices are "APC","AP","AC","PC","Ad","Pd","Cd","A","P","C","t". Default is "APC". |
apc.index |
Optional. List. See |
replicate.version.1.3.1 |
Optional. Logical. Replicate error in covariance calculation for "poisson.response","od.poisson.response" in versions 1.2.3-1.3.1. Default=FALSE |
apc.fit.table produces a deviance table. There are 15 rows corresponding to all possible design choices. The columns are as follows.
"-2logL" |
-2 log Likelihood up to some constant.
If the model family is Poisson or binomial (logistic)
this is the same as the |
"df.residual" |
Degrees of freedom of residual: nrow x ncol - dim(parameter). If the model.family="poisson.response" the degrees of freedom is one lower. |
"prob(>chi_sq)" |
p-value of the deviance, -2logL. Left out in Gaussian case which has no saturated model |
"LR vs APC" |
the likelihood ratio statistic against the "APC" model. |
"df" |
Degrees of freedom against the "APC" model. |
"prob(>chi_sq)" |
p-value of log likelihood ratio statistic. |
"aic" |
Akaike's "An Information Criterion", minus twice the maximized log-likelihood plus twice the
number of parameters upto a constant. It is take directly from the
|
"F" |
Only for "od.poisson.response". F statistic: Ratio of deviance for submodel divided by degrees of freedom to deviance of apc model divided by degrees of freedom. |
"prof(>F)" |
Only for "od.poisson.response". F statistic: with degrees of freedom given by differences between sub-model and apc model and between apc model and saturated model. |
apc.fit.model returns a list. The entries are as follows.
fit |
List. Values from |
apc.index |
List. Values from |
coefficients.canonical |
Matrix. For each coordinate of the canonical parameters is reported coefficient, standard deviation, z-value, which is the ratio of those, and asymptotically normal p-values.
Note, for "od.poisson.response" the reported standard errors corrected by the deviance and p-values are asymptotically t distributed, see Harnau and Nielsen (2016).
Other parametrizations can be computed using |
covariance.canonical |
Matrix. Estimated covariance matrix for canonical parameters. |
slopes |
Vector. Length three. The design matrix found by |
difdif |
Vector. Length three. The design matrix found by |
index.age |
Vector. Indices for age double difference parameters within |
index.per |
Vector. Indices for period double difference parameters within |
index.coh |
Vector. Indices for cohort double difference parameters within |
dates |
Vector. Indicates the dates for the double difference parameters within |
model.family |
Character. Argument. |
model.design |
Character. Argument. |
RSS |
Numeric. Residual sum of squares. NULL for non-gaussian families. |
sigma2 |
Numeric. Maximum likelihood estimator for variance: RSS/n. NULL for non-gaussian families. |
s2 |
Numeric. Least squares estimator for variance: RSS/df. NULL for non-gaussian families. |
n.decimal |
Numeric. From |
predictors |
Vector. Design*Estimates.
Same as the |
For gaussian families deviance is defined differently in apc
and glm
.
Here it is -2 log likelihood. In glm
it is RSS.
The values for apc.fit.model
include the apc.data.list
and the apc.index
returned by
apc.get.index
.
For the poisson.response
the inference is conditional on the level, see Martinez Miranda, Nielsen and Nielsen (2015).
The coefficients.canonical
computed by apc
are therefore different from the default coefficients
computed by glm
.
For the od.poisson.response
an asymptotic theory is used that mimics the conditioning for poisson.response
.
The asymptotic distribution are, however, asymptotically t or F distributed, see Harnau and Nielsen (2017).
For the log.normal.response
standard normal theory applies for quantities on the log scale including estimators.
An asymptotic theory for quantities on the original scale is provided in Kuang and Nielsen (2018).
For coefficients
the 3rd and 4th columns have headings t value
and Pr(>|t|)
for od.poisson.response
to indicate an asymptotic t theory
and otherwise
z value
and Pr(>|z|)
to indicate an asymptotic normal theory. The labels are inherited from glm.fit
.
Bent Nielsen <bent.nielsen@nuffield.ox.ac.uk> 15 Aug 2018 (27 Aug 2014)
Harnau, J. and Nielsen (2016) Over-dispersed age-period-cohort models. To appear in Journal of the American Statistical Association. Download: Nuffield DP
Kuang, D, Nielsen B (2018) Generalized log-normal chain-ladder. mimeo Nuffield Collge.
Kuang, D., Nielsen, B. and Nielsen, J.P. (2008a) Identification of the age-period-cohort model and the extended chain ladder model. Biometrika 95, 979-986. Download: Article; Earlier version Nuffield DP.
Martinez Miranda, M.D., Nielsen, B. and Nielsen, J.P. (2015) Inference and forecasting in the age-period-cohort model with unknown exposure with an application to mesothelioma mortality. Journal of the Royal Statistical Society A 178, 29-55. Download: Article, Nuffield DP.
The fit is done using glm.fit
.
The examples below use Italian bladder cancer data, see data.Italian.bladder.cancer
and
Belgian lung cancer data, see data.Belgian.lung.cancer
.
In example 3 the design matrix is called is called using apc.get.design
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | #####################
# EXAMPLE 1 with Italian bladder cancer data
data.list <- data.Italian.bladder.cancer() # function gives data list
apc.fit.table(data.list,"poisson.dose.response")
# -2logL df.residual prob(>chi_sq) LR.vs.APC df.vs.APC prob(>chi_sq) aic
# APC 33.179 27 0.191 NA NA NA 487.624
# AP 512.514 40 0.000 479.335 13 0.000 940.958
# AC 39.390 30 0.117 6.211 3 0.102 487.835
# PC 1146.649 36 0.000 1113.470 9 0.000 1583.094
# Ad 518.543 43 0.000 485.364 16 0.000 940.988
# Pd 4041.373 49 0.000 4008.194 22 0.000 4451.818
# Cd 1155.629 39 0.000 1122.450 12 0.000 1586.074
# A 2223.800 44 0.000 2190.621 17 0.000 2644.245
# P 84323.944 50 0.000 84290.765 23 0.000 84732.389
# C 23794.205 40 0.000 23761.026 13 0.000 24222.650
# t 4052.906 52 0.000 4019.727 25 0.000 4457.351
# tA 5825.158 53 0.000 5791.979 26 0.000 6227.602
# tP 84325.758 53 0.000 84292.579 26 0.000 84728.203
# tC 33446.796 53 0.000 33413.617 26 0.000 33849.241
# 1 87313.678 54 0.000 87280.499 27 0.000 87714.123
#
# Table suggests that "APC" and "AC" fit equally well. Try both
fit.apc <- apc.fit.model(data.list,"poisson.dose.response","APC")
fit.ac <- apc.fit.model(data.list,"poisson.dose.response","AC")
# Compare the estimates: They are very similar
fit.apc$coefficients.canonical
fit.ac$coefficients.canonical
#####################
# EXAMPLE 2 with Belgian lung cancer data
# This example illustrates how to find the linear predictors
data.list <- data.Belgian.lung.cancer()
# Get an APC fit
fit.apc <- apc.fit.model(data.list,"poisson.dose.response","APC")
# The linear predictor of the fit is a vector.
# But, we would like it in the same format as the data.
# Thus create matrix of same dimension as response data
# This can be done in two ways
m.lp <- data.list$response # using original information
m.lp <- fit.apc$response # using information copied when fitting
# the fit object index.data is used to fill linear predictor in
# vector format into matrix format
m.lp[fit.apc$index.data] <-fit.apc$linear.predictors
exp(m.lp)
#####################
# EXAMPLE 3 with Belgian lung cancer data
# This example illustrates how apc.fit.model works.
data.list <- data.Belgian.lung.cancer()
# Vectorise data
index <- apc.get.index(data.list)
v.response <- data.list$response[index$index.data]
v.dose <- data.list$dose[index$index.data]
# Get design
m.design <- apc.get.design(index,"APC")$design
# Fit using glm.fit from stats package
fit.apc.glm <- glm.fit(m.design,v.response,family=poisson(link="log"),offset=log(v.dose))
# Get canonical coefficients
v.cc <- fit.apc.glm$coefficients
# Find linear predictors and express in matrix form
m.fit <- data.list$response # create matrix
m.fit[index$index.data] <- m.design
m.fit.offset <- m.fit + log(data.list$dose) # add offset
exp(m.fit.offset)
# Compare with linear.predictors from glm.fit
# difference should be zero
sum(abs(m.fit.offset[index$index.data]-fit.apc.glm$linear.predictors))
#####################
# EXAMPLE 4 with Taylor-Ashe loss data
# This example illustrates the over-dispersed poisson response model.
data <- data.loss.TA()
fit.apc.od <- apc.fit.model(data,"od.poisson.response","APC")
fit.apc.od$coefficients.canonical[1:5,]
fit.apc.no.od <- apc.fit.model(data,"poisson.response","APC")
fit.apc.no.od$coefficients.canonical[1:5,]
|
-2logL df.residual prob(>chi_sq) LR.vs.APC df.vs.APC prob(>chi_sq)
APC 33.179 27 0.191 NA NA NA
AP 512.514 40 0.000 479.335 13 0.000
AC 39.390 30 0.117 6.211 3 0.102
PC 1146.649 36 0.000 1113.470 9 0.000
Ad 518.543 43 0.000 485.364 16 0.000
Pd 4041.373 49 0.000 4008.194 22 0.000
Cd 1155.629 39 0.000 1122.450 12 0.000
A 2223.800 44 0.000 2190.621 17 0.000
P 84323.944 50 0.000 84290.765 23 0.000
C 23794.205 40 0.000 23761.026 13 0.000
t 4052.906 52 0.000 4019.727 25 0.000
tA 5825.158 53 0.000 5791.979 26 0.000
tP 84325.758 53 0.000 84292.579 26 0.000
tC 33446.796 53 0.000 33413.617 26 0.000
1 87313.678 54 0.000 87280.499 27 0.000
aic
APC 487.624
AP 940.958
AC 487.835
PC 1583.094
Ad 940.988
Pd 4451.818
Cd 1586.074
A 2644.245
P 84732.389
C 24222.650
t 4457.351
tA 6227.602
tP 84728.203
tC 33849.241
1 87714.123
Estimate Std. Error z value Pr(>|z|)
level 1.87648684 0.02828139 66.3505883 0.000000e+00
age slope 0.76014175 0.03283618 23.1495203 1.470014e-118
cohort slope 0.03269579 0.02622515 1.2467342 2.124950e-01
DD_age_35 -0.55065499 0.40664725 -1.3541343 1.756935e-01
DD_age_40 0.13664203 0.21884545 0.6243768 5.323801e-01
DD_age_45 -0.05196011 0.13248649 -0.3921918 6.949165e-01
DD_age_50 -0.10159156 0.08209743 -1.2374511 2.159197e-01
DD_age_55 -0.16916985 0.05361601 -3.1552112 1.603820e-03
DD_age_60 -0.13525709 0.03874505 -3.4909512 4.813042e-04
DD_age_65 -0.08681738 0.03044437 -2.8516723 4.348991e-03
DD_age_70 -0.09762626 0.02687372 -3.6327777 2.803865e-04
DD_age_75 -0.02662955 0.02717706 -0.9798538 3.271583e-01
DD_period_1965 0.01143513 0.03148918 0.3631446 7.164969e-01
DD_period_1970 0.01900914 0.02772827 0.6855508 4.929964e-01
DD_period_1975 0.01493432 0.02468704 0.6049457 5.452151e-01
DD_cohort_1890 0.15804501 0.06198293 2.5498151 1.077801e-02
DD_cohort_1895 0.03929894 0.04413320 0.8904621 3.732178e-01
DD_cohort_1900 -0.07836353 0.03405310 -2.3012155 2.137945e-02
DD_cohort_1905 -0.09985060 0.02798423 -3.5681030 3.595752e-04
DD_cohort_1910 -0.05478258 0.02801528 -1.9554535 5.052956e-02
DD_cohort_1915 -0.06493072 0.03267698 -1.9870481 4.691707e-02
DD_cohort_1920 0.06538341 0.04468068 1.4633487 1.433720e-01
DD_cohort_1925 0.07563192 0.06125472 1.2347116 2.169379e-01
DD_cohort_1930 -0.13413655 0.08417885 -1.5934710 1.110546e-01
DD_cohort_1935 0.12744049 0.13314462 0.9571584 3.384873e-01
DD_cohort_1940 -0.51208269 0.23735644 -2.1574417 3.097126e-02
DD_cohort_1945 0.40832896 0.45267592 0.9020337 3.670389e-01
DD_cohort_1950 1.59638636 0.71890833 2.2205701 2.638009e-02
Estimate Std. Error z value Pr(>|z|)
level 1.85265468 0.02578223 71.8578030 0.000000e+00
age slope 0.78409316 0.02641649 29.6819581 1.312775e-193
cohort slope 0.05847985 0.01724745 3.3906365 6.973051e-04
DD_age_35 -0.55232153 0.40651333 -1.3586800 1.742480e-01
DD_age_40 0.13925320 0.21869899 0.6367345 5.242978e-01
DD_age_45 -0.05132193 0.13235337 -0.3877644 6.981904e-01
DD_age_50 -0.10329194 0.08200093 -1.2596435 2.077980e-01
DD_age_55 -0.17191364 0.05353344 -3.2113318 1.321213e-03
DD_age_60 -0.13376443 0.03869639 -3.4567677 5.466960e-04
DD_age_65 -0.08602377 0.03040561 -2.8292070 4.666351e-03
DD_age_70 -0.09920044 0.02682848 -3.6975802 2.176644e-04
DD_age_75 -0.02884898 0.02711003 -1.0641440 2.872635e-01
DD_cohort_1890 0.16139227 0.06099477 2.6460018 8.144937e-03
DD_cohort_1895 0.04572057 0.04365464 1.0473245 2.949499e-01
DD_cohort_1900 -0.07048964 0.03372064 -2.0904002 3.658187e-02
DD_cohort_1905 -0.10365548 0.02790974 -3.7139541 2.040458e-04
DD_cohort_1910 -0.05369876 0.02797461 -1.9195537 5.491430e-02
DD_cohort_1915 -0.06414480 0.03263594 -1.9654649 4.936045e-02
DD_cohort_1920 0.06484795 0.04463480 1.4528563 1.462637e-01
DD_cohort_1925 0.07601712 0.06118984 1.2423162 2.141199e-01
DD_cohort_1930 -0.13302382 0.08408656 -1.5819867 1.136526e-01
DD_cohort_1935 0.12771607 0.13301382 0.9601715 3.369689e-01
DD_cohort_1940 -0.51293696 0.23714937 -2.1629278 3.054674e-02
DD_cohort_1945 0.40972906 0.45233878 0.9058013 3.650410e-01
DD_cohort_1950 1.59847337 0.71813495 2.2258677 2.602305e-02
1955-1959 1960-1964 1965-1969 1970-1974
25-29 3.96771 3.963962 4.068328 3.00000
30-34 10.42549 12.977213 11.665624 12.93167
35-39 16.17695 20.059348 23.393292 22.37041
40-44 33.29595 39.614559 46.427705 55.66179
45-49 73.52215 69.305700 78.084697 97.08746
50-54 113.74001 123.168769 109.967426 131.12380
55-59 168.93594 179.820491 184.812645 176.43092
60-64 191.61151 238.482055 241.409998 264.49644
65-69 208.32046 267.571302 316.078730 342.02950
70-74 216.00384 260.040435 316.452455 400.50327
75-79 198.00000 220.996165 253.639100 330.36474
Warning message:
In m.fit[index$index.data] <- m.design :
number of items to replace is not a multiple of replacement length
1955-1959 1960-1964 1965-1969 1970-1974
25-29 42.92024 41.81972 38.05595 42.92024
30-34 45.30470 44.38011 41.52931 38.28566
35-39 38.33474 45.30470 44.38011 41.43722
40-44 36.65099 37.84949 45.12563 42.62395
45-49 43.24539 35.92015 37.49354 44.48098
50-54 43.65725 41.89352 35.17776 36.44910
55-59 41.19404 41.68032 40.51698 34.12069
60-64 35.54393 38.52421 39.56610 38.45751
65-69 28.99677 32.11421 35.26125 36.31079
70-74 23.10125 24.53323 27.47722 30.31760
75-79 16.08066 17.30772 18.70343 21.02951
[1] 67.0634
Estimate Std. Error t value Pr(>|t|)
level 12.78786359 NA NA NA
age slope 0.69776392 0.2240298 3.11460328 0.0042223991
cohort slope 0.11148167 0.2505890 0.44487848 0.6598277322
DD_age_3 -0.89563247 0.2200824 -4.06953304 0.0003486379
DD_age_4 0.01357051 0.2035538 0.06666795 0.9473198145
Estimate Std. Error z value Pr(>|z|)
level 12.78786359 NA NA NA
age slope 0.69776392 0.0010034992 695.33084 0.00000e+00
cohort slope 0.11148167 0.0011224663 99.31850 0.00000e+00
DD_age_3 -0.89563247 0.0009858174 -908.51758 0.00000e+00
DD_age_4 0.01357051 0.0009117808 14.88353 4.21667e-50
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.