apc.fit.model: Fits an age period cohort model

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/apc_fit.R

Description

apc.fit.model fits the age period cohort model as a Generalized Linear Model using glm.fit. The model is parametrised in terms of the canonical parameter introduced by Kuang, Nielsen and Nielsen (2008), see also the implementation in Martinez Miranda, Nielsen and Nielsen (2015). This parametrisation has a number of advantages: it is freely varying, it is the canonical parameter of a regular exponential family, and it is invariant to extentions of the data matrix. Other parametrizations can be computed using apc.identify.

apc.fit.model can be be used for all three age period cohort factors, or for submodels with fewer of these factors.

apc.fit.model can be used either for mortality rates through a dose-response model or for mortality counts through a pure response model without doses/exposures.

The GLM families include Poisson regressions (with log link) and Normal/Gaussian least squares regressions.

apc.fit.table produces a deviance table for 15 combinations of the three factors and linear trends: "APC", "AP", "AC", "PC", "Ad", "Pd", "Cd", "A", "P", "C", "t", "tA", "tP", "tC", "1".

Usage

1
2
3
4
	apc.fit.model(apc.data.list,model.family,model.design,apc.index=NULL,
			replicate.version.1.3.1=FALSE)
		apc.fit.table(apc.data.list,model.family,model.design.reference="APC",
			apc.index=NULL)

Arguments

apc.data.list

List. See apc.data.list for a description of the format.

model.family

Character. The following options are implemented. These are used internally when calling glm.fit.

"poisson.response"

This sets family=poisson(link="log"). Only responses are used. Inference is done in a multinomial model, conditioning on the overall level as documented in Martinez Miranda, Nielsen and Nielsen (2015).

"od.poisson.response"

This sets family=quasipoisson(link="log") in the estimation step, but then reverts to family=poisson(link="log") when computing standard errors, which are then corrected. Only responses are used. Inference is done in an over-dispersed Poisson model as documented in Harnau and Nielsen (2016). Note that limit distributions are t and F not normal and chi2.

"poisson.dose.response"

This sets family=poisson(link="log"). Doses are used as offset.

"binomial.dose.response"

This sets family=binomial(link="logit") and gives a logistic regression.

"gaussian.rates"

This sets family=gaussian(link="identity"). The dependent variable is the mortality rates, which are computed as response/dose.

"gaussian.response"

This sets family=gaussian(link="identity"). Only responses are used. The dependent variable is the responses.

"log.normal.rates"

Gaussian regression for log(rates) and with identity link (Least Squares).

"log.normal.response"

Gaussian regression for log(response) and with identity link (Least Squares).

model.design

Character. This indicates the design choice. The following options are possible.

"APC"

Age-period-cohort model.

"AP"

Age-period model. Nested in "APC"

"AC"

Age-cohort model. Nested in "APC"

"PC"

Period-cohort model. Nested in "APC"

"Ad"

Age-trend model, including age effect and two linear trends. Nested in "AP", "AC".

"Pd"

Period-trend model, including period effect and two linear trends. Nested in "AP", "PC".

"Cd"

Cohort-trend model, including cohort effect and two linear trends. Nested in "AC", "PC".

"A"

Age model. Nested in "Ad".

"P"

Period model. Nested in "Pd".

"C"

Cohort model. Nested in "Cd".

"t"

Trend model, with two linear trends. Nested in "Ad", "Pd", "Cd".

"tA"

Single trend model in age index. Nested in "A", "t".

"tP"

Single trend model in period index. Nested in "P", "t".

"tC"

Single trend model in cohort index. Nested in "C", "t".

"1"

Constant model. Nested in "tA", "tP", "tC".

model.design.reference

Character. This indicates the reference design choice for the deviance table. Choices are "APC","AP","AC","PC","Ad","Pd","Cd","A","P","C","t". Default is "APC".

apc.index

Optional. List. See apc.get.index for a description of the format. If not provided this is computed internally. If apc.fit.model is used in a simulation study computational effort can be saved when using this option.

replicate.version.1.3.1

Optional. Logical. Replicate error in covariance calculation for "poisson.response","od.poisson.response" in versions 1.2.3-1.3.1. Default=FALSE

Value

apc.fit.table produces a deviance table. There are 15 rows corresponding to all possible design choices. The columns are as follows.

"-2logL"

-2 log Likelihood up to some constant. If the model family is Poisson or binomial (logistic) this is the same as the glm deviance: That is the difference in -2 log likelihood value between estimated model and the saturated model. If the model family is Gaussian it is different from the traditional glm deviance. Here the -2 log likelihood value is measured in a model with unknown variance, which is the standard in regression analysis, whereas in the glm package the deviance is the residual sum of squares, which can be interpreted as the -2 log likelihood value in a model with variance set to one.

"df.residual"

Degrees of freedom of residual: nrow x ncol - dim(parameter). If the model.family="poisson.response" the degrees of freedom is one lower.

"prob(>chi_sq)"

p-value of the deviance, -2logL. Left out in Gaussian case which has no saturated model

"LR vs APC"

the likelihood ratio statistic against the "APC" model.

"df"

Degrees of freedom against the "APC" model.

"prob(>chi_sq)"

p-value of log likelihood ratio statistic.

"aic"

Akaike's "An Information Criterion", minus twice the maximized log-likelihood plus twice the number of parameters upto a constant. It is take directly from the glm function. For the "poisson.dose.response" and "binomial.dose.response" model families the dispersion is fixed at one and the number of parameters is the number of coefficients. The "poisson.response" model is conditional on the level. The number of parameters should therefore be adjusted by subtracting 2 to take this into account to get the proper AIC. However, in practice this does not matter, since we are only interested in relative effects. For the "gaussian.response" and "gaussian.dose.response" model families the dispersion is estimated from the residual deviance.

"F"

Only for "od.poisson.response". F statistic: Ratio of deviance for submodel divided by degrees of freedom to deviance of apc model divided by degrees of freedom.

"prof(>F)"

Only for "od.poisson.response". F statistic: with degrees of freedom given by differences between sub-model and apc model and between apc model and saturated model.

apc.fit.model returns a list. The entries are as follows.

fit

List. Values from glm.fit.

apc.index

List. Values from apc.get.index.

coefficients.canonical

Matrix. For each coordinate of the canonical parameters is reported coefficient, standard deviation, z-value, which is the ratio of those, and asymptotically normal p-values. Note, for "od.poisson.response" the reported standard errors corrected by the deviance and p-values are asymptotically t distributed, see Harnau and Nielsen (2016). Other parametrizations can be computed using apc.identify.

covariance.canonical

Matrix. Estimated covariance matrix for canonical parameters.

slopes

Vector. Length three. The design matrix found by apc.get.design.collinear has age, period, and cohort linear trends. slopes indicates which of these are actually used in estimation.

difdif

Vector. Length three. The design matrix found by apc.get.design.collinear has age, period, and cohort double differences. slopes indicates which of these are actually used in estimation.

index.age

Vector. Indices for age double difference parameters within coefficients.canonical. NULL if age double differences are not estimated.

index.per

Vector. Indices for period double difference parameters within coefficients.canonical. NULL if period double differences are not estimated.

index.coh

Vector. Indices for cohort double difference parameters within coefficients.canonical. NULL if cohort double differences are not estimated.

dates

Vector. Indicates the dates for the double difference parameters within coefficients.canonical.

model.family

Character. Argument.

model.design

Character. Argument.

RSS

Numeric. Residual sum of squares. NULL for non-gaussian families.

sigma2

Numeric. Maximum likelihood estimator for variance: RSS/n. NULL for non-gaussian families.

s2

Numeric. Least squares estimator for variance: RSS/df. NULL for non-gaussian families.

n.decimal

Numeric. From apc.data.list.

predictors

Vector. Design*Estimates. Same as the glm.fit value linear.predictors when there is no offset.

Note

For gaussian families deviance is defined differently in apc and glm. Here it is -2 log likelihood. In glm it is RSS.

The values for apc.fit.model include the apc.data.list and the apc.index returned by apc.get.index.

For the poisson.response the inference is conditional on the level, see Martinez Miranda, Nielsen and Nielsen (2015). The coefficients.canonical computed by apc are therefore different from the default coefficients computed by glm.

For the od.poisson.response an asymptotic theory is used that mimics the conditioning for poisson.response. The asymptotic distribution are, however, asymptotically t or F distributed, see Harnau and Nielsen (2017).

For the log.normal.response standard normal theory applies for quantities on the log scale including estimators. An asymptotic theory for quantities on the original scale is provided in Kuang and Nielsen (2018).

For coefficients the 3rd and 4th columns have headings t value and Pr(>|t|) for od.poisson.response to indicate an asymptotic t theory and otherwise z value and Pr(>|z|) to indicate an asymptotic normal theory. The labels are inherited from glm.fit.

Author(s)

Bent Nielsen <bent.nielsen@nuffield.ox.ac.uk> 15 Aug 2018 (27 Aug 2014)

References

Harnau, J. and Nielsen (2016) Over-dispersed age-period-cohort models. To appear in Journal of the American Statistical Association. Download: Nuffield DP

Kuang, D, Nielsen B (2018) Generalized log-normal chain-ladder. mimeo Nuffield Collge.

Kuang, D., Nielsen, B. and Nielsen, J.P. (2008a) Identification of the age-period-cohort model and the extended chain ladder model. Biometrika 95, 979-986. Download: Article; Earlier version Nuffield DP.

Martinez Miranda, M.D., Nielsen, B. and Nielsen, J.P. (2015) Inference and forecasting in the age-period-cohort model with unknown exposure with an application to mesothelioma mortality. Journal of the Royal Statistical Society A 178, 29-55. Download: Article, Nuffield DP.

See Also

The fit is done using glm.fit.

The examples below use Italian bladder cancer data, see data.Italian.bladder.cancer and Belgian lung cancer data, see data.Belgian.lung.cancer.

In example 3 the design matrix is called is called using apc.get.design.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#####################
#	EXAMPLE 1 with Italian bladder cancer data

data.list	<- data.Italian.bladder.cancer()	#	function gives data list
apc.fit.table(data.list,"poisson.dose.response")

#	       -2logL df.residual prob(>chi_sq) LR.vs.APC df.vs.APC prob(>chi_sq)       aic
#	APC    33.179          27         0.191        NA        NA            NA   487.624
#	AP    512.514          40         0.000   479.335        13         0.000   940.958
#	AC     39.390          30         0.117     6.211         3         0.102   487.835
#	PC   1146.649          36         0.000  1113.470         9         0.000  1583.094
#	Ad    518.543          43         0.000   485.364        16         0.000   940.988
#	Pd   4041.373          49         0.000  4008.194        22         0.000  4451.818
#	Cd   1155.629          39         0.000  1122.450        12         0.000  1586.074
#	A    2223.800          44         0.000  2190.621        17         0.000  2644.245
#	P   84323.944          50         0.000 84290.765        23         0.000 84732.389
#	C   23794.205          40         0.000 23761.026        13         0.000 24222.650
#	t    4052.906          52         0.000  4019.727        25         0.000  4457.351
#	tA   5825.158          53         0.000  5791.979        26         0.000  6227.602
#	tP  84325.758          53         0.000 84292.579        26         0.000 84728.203
#	tC  33446.796          53         0.000 33413.617        26         0.000 33849.241
#	1   87313.678          54         0.000 87280.499        27         0.000 87714.123
#
#	Table suggests that "APC" and "AC" fit equally well.  Try both

fit.apc	<- apc.fit.model(data.list,"poisson.dose.response","APC")
fit.ac	<- apc.fit.model(data.list,"poisson.dose.response","AC")

#	Compare the estimates: They are very similar

fit.apc$coefficients.canonical
fit.ac$coefficients.canonical

#####################
#	EXAMPLE 2 with Belgian lung cancer data
#	This example illustrates how to find the linear predictors 

data.list	<- data.Belgian.lung.cancer()

#	Get an APC fit

fit.apc	<- apc.fit.model(data.list,"poisson.dose.response","APC")

#	The linear predictor of the fit is a vector.
#	But, we would like it in the same format as the data.
#	Thus create matrix of same dimension as response data
#	This can be done in two ways

m.lp	<- data.list$response	#	using original information	
m.lp	<- fit.apc$response		# 	using information copied when fitting

#	the fit object index.data is used to fill linear predictor in
#	vector format into matrix format

m.lp[fit.apc$index.data]	<-fit.apc$linear.predictors
exp(m.lp)

#####################
#	EXAMPLE 3 with Belgian lung cancer data
#	This example illustrates how apc.fit.model works.

data.list	<- data.Belgian.lung.cancer()

#	Vectorise data
index		<- apc.get.index(data.list)
v.response	<- data.list$response[index$index.data]
v.dose		<- data.list$dose[index$index.data]

#	Get design
m.design	<- apc.get.design(index,"APC")$design

#	Fit using glm.fit from stats package
fit.apc.glm	<- glm.fit(m.design,v.response,family=poisson(link="log"),offset=log(v.dose))

#	Get canonical coefficients
v.cc		<- fit.apc.glm$coefficients

#	Find linear predictors and express in matrix form
m.fit		<- data.list$response			#	create matrix
m.fit[index$index.data]		<- m.design 
m.fit.offset		<- m.fit + log(data.list$dose)	#	add offset
exp(m.fit.offset)

#	Compare with linear.predictors from glm.fit
#	difference should be zero
sum(abs(m.fit.offset[index$index.data]-fit.apc.glm$linear.predictors))

#####################
#	EXAMPLE 4 with Taylor-Ashe loss data
#	This example illustrates the over-dispersed poisson response model.

data <- data.loss.TA()
fit.apc.od <- apc.fit.model(data,"od.poisson.response","APC")
fit.apc.od$coefficients.canonical[1:5,]
fit.apc.no.od <- apc.fit.model(data,"poisson.response","APC")
fit.apc.no.od$coefficients.canonical[1:5,]

Example output

       -2logL df.residual prob(>chi_sq) LR.vs.APC df.vs.APC prob(>chi_sq)
APC    33.179          27         0.191        NA        NA            NA
AP    512.514          40         0.000   479.335        13         0.000
AC     39.390          30         0.117     6.211         3         0.102
PC   1146.649          36         0.000  1113.470         9         0.000
Ad    518.543          43         0.000   485.364        16         0.000
Pd   4041.373          49         0.000  4008.194        22         0.000
Cd   1155.629          39         0.000  1122.450        12         0.000
A    2223.800          44         0.000  2190.621        17         0.000
P   84323.944          50         0.000 84290.765        23         0.000
C   23794.205          40         0.000 23761.026        13         0.000
t    4052.906          52         0.000  4019.727        25         0.000
tA   5825.158          53         0.000  5791.979        26         0.000
tP  84325.758          53         0.000 84292.579        26         0.000
tC  33446.796          53         0.000 33413.617        26         0.000
1   87313.678          54         0.000 87280.499        27         0.000
          aic
APC   487.624
AP    940.958
AC    487.835
PC   1583.094
Ad    940.988
Pd   4451.818
Cd   1586.074
A    2644.245
P   84732.389
C   24222.650
t    4457.351
tA   6227.602
tP  84728.203
tC  33849.241
1   87714.123
                  Estimate Std. Error    z value      Pr(>|z|)
level           1.87648684 0.02828139 66.3505883  0.000000e+00
age slope       0.76014175 0.03283618 23.1495203 1.470014e-118
cohort slope    0.03269579 0.02622515  1.2467342  2.124950e-01
DD_age_35      -0.55065499 0.40664725 -1.3541343  1.756935e-01
DD_age_40       0.13664203 0.21884545  0.6243768  5.323801e-01
DD_age_45      -0.05196011 0.13248649 -0.3921918  6.949165e-01
DD_age_50      -0.10159156 0.08209743 -1.2374511  2.159197e-01
DD_age_55      -0.16916985 0.05361601 -3.1552112  1.603820e-03
DD_age_60      -0.13525709 0.03874505 -3.4909512  4.813042e-04
DD_age_65      -0.08681738 0.03044437 -2.8516723  4.348991e-03
DD_age_70      -0.09762626 0.02687372 -3.6327777  2.803865e-04
DD_age_75      -0.02662955 0.02717706 -0.9798538  3.271583e-01
DD_period_1965  0.01143513 0.03148918  0.3631446  7.164969e-01
DD_period_1970  0.01900914 0.02772827  0.6855508  4.929964e-01
DD_period_1975  0.01493432 0.02468704  0.6049457  5.452151e-01
DD_cohort_1890  0.15804501 0.06198293  2.5498151  1.077801e-02
DD_cohort_1895  0.03929894 0.04413320  0.8904621  3.732178e-01
DD_cohort_1900 -0.07836353 0.03405310 -2.3012155  2.137945e-02
DD_cohort_1905 -0.09985060 0.02798423 -3.5681030  3.595752e-04
DD_cohort_1910 -0.05478258 0.02801528 -1.9554535  5.052956e-02
DD_cohort_1915 -0.06493072 0.03267698 -1.9870481  4.691707e-02
DD_cohort_1920  0.06538341 0.04468068  1.4633487  1.433720e-01
DD_cohort_1925  0.07563192 0.06125472  1.2347116  2.169379e-01
DD_cohort_1930 -0.13413655 0.08417885 -1.5934710  1.110546e-01
DD_cohort_1935  0.12744049 0.13314462  0.9571584  3.384873e-01
DD_cohort_1940 -0.51208269 0.23735644 -2.1574417  3.097126e-02
DD_cohort_1945  0.40832896 0.45267592  0.9020337  3.670389e-01
DD_cohort_1950  1.59638636 0.71890833  2.2205701  2.638009e-02
                  Estimate Std. Error    z value      Pr(>|z|)
level           1.85265468 0.02578223 71.8578030  0.000000e+00
age slope       0.78409316 0.02641649 29.6819581 1.312775e-193
cohort slope    0.05847985 0.01724745  3.3906365  6.973051e-04
DD_age_35      -0.55232153 0.40651333 -1.3586800  1.742480e-01
DD_age_40       0.13925320 0.21869899  0.6367345  5.242978e-01
DD_age_45      -0.05132193 0.13235337 -0.3877644  6.981904e-01
DD_age_50      -0.10329194 0.08200093 -1.2596435  2.077980e-01
DD_age_55      -0.17191364 0.05353344 -3.2113318  1.321213e-03
DD_age_60      -0.13376443 0.03869639 -3.4567677  5.466960e-04
DD_age_65      -0.08602377 0.03040561 -2.8292070  4.666351e-03
DD_age_70      -0.09920044 0.02682848 -3.6975802  2.176644e-04
DD_age_75      -0.02884898 0.02711003 -1.0641440  2.872635e-01
DD_cohort_1890  0.16139227 0.06099477  2.6460018  8.144937e-03
DD_cohort_1895  0.04572057 0.04365464  1.0473245  2.949499e-01
DD_cohort_1900 -0.07048964 0.03372064 -2.0904002  3.658187e-02
DD_cohort_1905 -0.10365548 0.02790974 -3.7139541  2.040458e-04
DD_cohort_1910 -0.05369876 0.02797461 -1.9195537  5.491430e-02
DD_cohort_1915 -0.06414480 0.03263594 -1.9654649  4.936045e-02
DD_cohort_1920  0.06484795 0.04463480  1.4528563  1.462637e-01
DD_cohort_1925  0.07601712 0.06118984  1.2423162  2.141199e-01
DD_cohort_1930 -0.13302382 0.08408656 -1.5819867  1.136526e-01
DD_cohort_1935  0.12771607 0.13301382  0.9601715  3.369689e-01
DD_cohort_1940 -0.51293696 0.23714937 -2.1629278  3.054674e-02
DD_cohort_1945  0.40972906 0.45233878  0.9058013  3.650410e-01
DD_cohort_1950  1.59847337 0.71813495  2.2258677  2.602305e-02
      1955-1959  1960-1964  1965-1969 1970-1974
25-29   3.96771   3.963962   4.068328   3.00000
30-34  10.42549  12.977213  11.665624  12.93167
35-39  16.17695  20.059348  23.393292  22.37041
40-44  33.29595  39.614559  46.427705  55.66179
45-49  73.52215  69.305700  78.084697  97.08746
50-54 113.74001 123.168769 109.967426 131.12380
55-59 168.93594 179.820491 184.812645 176.43092
60-64 191.61151 238.482055 241.409998 264.49644
65-69 208.32046 267.571302 316.078730 342.02950
70-74 216.00384 260.040435 316.452455 400.50327
75-79 198.00000 220.996165 253.639100 330.36474
Warning message:
In m.fit[index$index.data] <- m.design :
  number of items to replace is not a multiple of replacement length
      1955-1959 1960-1964 1965-1969 1970-1974
25-29  42.92024  41.81972  38.05595  42.92024
30-34  45.30470  44.38011  41.52931  38.28566
35-39  38.33474  45.30470  44.38011  41.43722
40-44  36.65099  37.84949  45.12563  42.62395
45-49  43.24539  35.92015  37.49354  44.48098
50-54  43.65725  41.89352  35.17776  36.44910
55-59  41.19404  41.68032  40.51698  34.12069
60-64  35.54393  38.52421  39.56610  38.45751
65-69  28.99677  32.11421  35.26125  36.31079
70-74  23.10125  24.53323  27.47722  30.31760
75-79  16.08066  17.30772  18.70343  21.02951
[1] 67.0634
                Estimate Std. Error     t value     Pr(>|t|)
level        12.78786359         NA          NA           NA
age slope     0.69776392  0.2240298  3.11460328 0.0042223991
cohort slope  0.11148167  0.2505890  0.44487848 0.6598277322
DD_age_3     -0.89563247  0.2200824 -4.06953304 0.0003486379
DD_age_4      0.01357051  0.2035538  0.06666795 0.9473198145
                Estimate   Std. Error    z value    Pr(>|z|)
level        12.78786359           NA         NA          NA
age slope     0.69776392 0.0010034992  695.33084 0.00000e+00
cohort slope  0.11148167 0.0011224663   99.31850 0.00000e+00
DD_age_3     -0.89563247 0.0009858174 -908.51758 0.00000e+00
DD_age_4      0.01357051 0.0009117808   14.88353 4.21667e-50

apc documentation built on Oct. 23, 2020, 6:17 p.m.