graduate: Graduate grouped data
In timriffe/DemoTools: Standardize, Evaluate, and Adjust Demographic Data

graduate

R Documentation

Graduate grouped data

Description

A wrapper function for several graduation methods, primarily for count data ("sprague", "beers(ord)", "beers(mod)", "grabill", "mono" (Monotonic spline), "uniform", "pclm"), but also with one ("pclm") with an option for graduating rates if both event counts and population at risk are available.

Usage

graduate(
  Value,
  Age,
  AgeInt = age2int(Age),
  OAG = TRUE,
  OAnew = max(Age),
  method = c("sprague", "beers(ord)", "beers(mod)", "grabill", "pclm", "mono", "uniform"),
  keep0 = FALSE,
  constrain = FALSE,
  ...
)

Arguments

`Value`	numeric vector, presumably counts in grouped ages
`Age`	integer vector, lower bounds of age groups
`AgeInt`	integer vector, age interval widths
`OAG`	logical, default = `TRUE` is the final age group open?
`OAnew`	integer, optional new open age, higher than `max(Age)`. See details.
`method`	character, either `"sprague"`, `"beers(ord)")`, `"beers(mod)")`, `"mono")`, `"uniform")`, or `"pclm"`
`keep0`	logical. Default `FALSE`. If available, should the value in the infant age group be maintained, and ages 1-4 constrained?
`constrain`	logical. Default `FALSE`. Should output be constrained to sum within the input age groups?
`...`	extra arguments passed to `graduate_beers()` or `graduate_pclm()`

Details

"sprague", "beers(ord)", "beers(mod)" methods require original data to be in uniform five-year age groups. If they are not (for example, the infant group is separate) then they are grouped to uniform width prior to splitting. If you want to keep the original infant count in output, then specify keep0 = TRUE. In this case, it is imputed, and ages 1-4 are rescaled, which may introduce a discontinuity in results from age 4 to 5. keep0 = TRUE may also be desired along with method = "pclm".

Some methods are constrained, others not, and others are optionally constrained. If this is required, then this function can be followed up with rescaleAgeGroups(), which may have the effect of breaking continuity in smooth results. This is inconsequential for downstream demography, but if this aesthetic side effect is undesired, then try one of the constrained methods: "sprague", "mono", "pclm" (with control = list(lambda = 1/1e7) specified or similar).

Beers may either be ordinary "beers(ord)" or modified "beers(mod)", and either can pass on the optional argument johnson = TRUE if desired (this has a different distribution pattern for young ages, FALSE by default). If method = "beers" is given, then "beers(ord)" is used.

This wrapper standardizes some inconsistencies in how open ages are dealt with. For example, with the "pclm" method, the last age group can be redistributed over a specified interval implied by increase OAnew beyond the range of Age. To get this same behavior from "mono", or "uniform" specify OAG = FALSE along with an appropriately high OAnew (or integer final value of AgeInt.

OAnew cannot be higher than max(Age)+4 for "sprague" or "beers" methods. For "uniform","mono","pclm" it can be higher than this, and in each case the open age group is completely redistributed within this range, meaning it's not really open anymore.

For all methods, negative values are detected in output. If present, we deal with these in the following way: we take the geometric mean between the given output (with negative imputed with 0s) and the output of graduate_mono(), which is guaranteed non-negative. This only affects age groups where negatives were produced in the first pass. In our experience this only arises when using Sprague, Beers, or Grabill methods, whereas all others are guaranteed non-negative.

For any case where input data are in single ages, constraining results to sum to values in the original age groups will simply return the original input data, which is clearly not your intent. This might arise when using graduation as an implicit two-step smoother (group + graduate). In this case, separate the steps, first group using groupAges() then use graduate(..., constrain = TRUE).

References

\insertRef

pascariu2018ungroupDemoTools \insertRefrizzi2015efficientDemoTools \insertRefsprague1880explanationDemoTools \insertRefshryock1973methodsDemoTools \insertRefsiegel2004methodsDemoTools \insertRefbeers1945modifiedDemoTools

Examples

Value <- pop5_mat[, 1]
Value <- c(10000,44170,Value[-1])
Age   <- sort(c(1,seq(0,100,by=5)))

graduate(Value, Age, method = "sprague")
graduate(Value, Age, method = "sprague", keep0=FALSE)

graduate(Value, Age, method = "beers(ord)")
graduate(Value, Age, method = "beers(ord)", keep0=TRUE)
graduate(Value, Age, method = "beers(ord)", keep0=TRUE, johnson = TRUE)

graduate(Value, Age, method = "beers(mod)")
graduate(Value, Age, method = "beers(mod)", keep0=TRUE)
graduate(Value, Age, method = "beers(mod)", keep0=TRUE, johnson = TRUE)

graduate(Value, Age, method = "mono")
graduate(Value, Age, method = "mono", keep0=TRUE)

graduate(Value, Age, method = "uniform")

graduate(Value, Age, method = "pclm")
graduate(Value, Age, method = "pclm", keep0=TRUE)
# pclm can also graduate rates if both
# numerators and denominators are on hand:
Exposures <- c(100958,466275,624134,559559,446736,370653,301862,249409,
               247473,223014,172260,149338,127242,105715,79614,53660,
               31021,16805,8000,4000,2000,1000)

Deaths <- c(8674,1592,618,411,755,1098,1100,1357,
            1335,3257,2200,4023,2167,4578,2956,4212,
            2887,2351,1500,900,500,300)
Age    <- c(0, 1, seq(5, 100, by = 5))
AgeInt <- c(diff(Age), NA)

# exclude infants for better fit.
mx    <- graduate(
           Value = Deaths[-1], Age = Age[-1],
           AgeInt = AgeInt[-1], OAG = TRUE,
           OAnew = 110, offset = Exposures[-1],
           method = "pclm")
mx_sm <- graduate(
           Value = Deaths[-1], Age = Age[-1],
           AgeInt = AgeInt[-1], OAG = TRUE,
           OAnew = 110, offset = Exposures[-1],
           method = "pclm", control = list(lambda = 1e7))

## Not run: 
plot(Age,
     Deaths / Exposures,
     type = 's', log = 'y',
     main = "Underlying data have differential heaping on 0s and 5s")
lines(1:110, mx)
lines(1:110, mx_sm, col = "blue")
legend("bottomright",
       col = c("black","blue"),
       lty = c(1, 1),
       legend = c("lambda optimized (almost constrained)",
                  "higher lambda = smoother")
       )
  
## End(Not run)

timriffe/DemoTools documentation built on Dec. 9, 2024, 8:17 a.m.