pop.predict.subnat: Subnational Probabilistic Population Projection
In bayesPop: Probabilistic Population Projection

pop.predict.subnat

R Documentation

Subnational Probabilistic Population Projection

Description

Generates trajectories of probabilistic population projection for subregions of a given country.

Usage

pop.predict.subnat(end.year = 2060, start.year = 1950, present.year = 2020, 
        wpp.year = 2019, output.dir = file.path(getwd(), "bayesPop.output"), 
        locations = NULL, default.country = NULL, annual = FALSE,
        inputs = list(
            popM = NULL, popF = NULL, 
            mxM = NULL, mxF = NULL, srb = NULL, 
            pasfr = NULL, patterns = NULL, 
            migM = NULL, migF = NULL, 
            migMt = NULL, migFt = NULL, mig = NULL, mig.fdm = NULL,
            e0F.file = NULL, e0M.file = NULL, tfr.file = NULL, 
            e0F.sim.dir = NULL, e0M.sim.dir = NULL, tfr.sim.dir = NULL, 
            migMtraj = NULL, migFtraj = NULL, migtraj = NULL,
            migFDMtraj = NULL, GQpopM = NULL, GQpopF = NULL, 
            average.annual = NULL
        ), 
        nr.traj = 1000, keep.vital.events = FALSE, 
        fixed.mx = FALSE, fixed.pasfr = FALSE, lc.for.all = TRUE,
         mig.is.rate = FALSE, mig.age.method = c("rc", "fdmp", "fdmnop"),
         mig.rc.fam = NULL, pasfr.ignore.phase2 = FALSE, 
         replace.output = FALSE, verbose = TRUE)

Arguments

`end.year`	End year of the projection.
`start.year`	First year of the historical data on mortality rates. It determines the length of the historical time series used in the Lee-Carter estimation.
`present.year`	Year for which initial population data is to be used.
`wpp.year`	Year for which WPP data is used. The function loads a package called wpp`x` where `x` is the `wpp.year` and uses its data (corresponding to the `default.country`) as default datasets if region-specific alternatives are not given (see more details below).
`output.dir`	Output directory of the projection.
`locations`	Name of a tab-delimited file that contains definitions of the subregions. It has a similar structure as `UNlocations`, with mandatory columns `reg_code` (unique identifier of the subregions) and `name` (name of the subregions). Optionally, `location_type` should be set to 4 for subregions to be processed. Column `country_code` can be included with the numerical code of the corresponding country. A row with `location_type` of 0 determines the country that the subregions belong to and is used for extracting default "national" datasets if the argument `default.country` is missing. In such a case, the code of the default country is taken from its column `country_code`. This is a mandatory argument.
`default.country`	Numerical code of a country to which the subregions belong to. It is used for extracting default datasets from the wpp package if some region-specific input datasets are missing. Alternatively, it can be also included in the `locations` file, see above. In either case, the code must exists in the `UNlocations` dataset.
`annual`	Logical. If `TRUE` it is assumed that this is 1x1 simulation, i.e. one year age groups and one year time periods.
`inputs`	A list of file names where input data is stored. Unless otherwise noted, these are tab delimited ASCII files with a mandatory column `reg_code` giving the numerical identifier of the subregions. If an element of this list is `NULL`, usually a default dataset corresponding to `default.country` is extracted from the wpp package. Names of these default datasets are shown in brackets. This list contains the following elements: popM, popF Initial male/female age-specific population (at time `present.year`). Mandatory items, no defaults. Must contain columns `reg_code` and `age` and be of the same structure as `popM` from wpp. mxM, mxF Historical data and (optionally) projections of male/female age-specific death rates [`mxM`, `mxF`] (see also argument `fixed.mx`). srb Projection of sex ratio at birth. [`sexRatio`] pasfr Historical data and (optionally) projections of percentage age-specific fertility rate [`percentASFR`] (see also argument `fixed.pasfr`). patterns Information on region's specifics regarding migration type, base year of the migration, mortality and fertility age patterns as defined in [`vwBaseYear`]. In addition, it can contain columns defining migration shares between the subregions, see Details below. migM, migF, migMt, migFt, mig Projection and (optionally) historical data of net migration on the same scale as the initital population. There are three ways of defining this quantity, here in order of priority: 1. via `migM` and `migF` which should give male and female age-specific migration [`migrationM`, `migrationF`]; 2. via `migMt` and `migFt` which should give male and female total net migration; 3. via `mig` which should give the total net migration. For 2. and 3., the totals are disagregated into age-specific migration by applying a Rogers-Castro schedule. For 3., the totals are equally split between sexes. If all of these input items are missing, the migration schedules are constructed from total migration counts of the `default.country` derived from `migration` using Rogers Castro for age distribution. Migration shares between subregions (including sex-specific shares) can be given in the `patterns` file, see above and Details below. If no shares are given, it is distributed by population shares. mig.fdm If `mig.age.method` is “fdmp” or “fdmnop”, this file is used to disaggregate total in- and out-migration into ages, giving proportions of the migration in-flow and out-flow for each age. It should have columns “reg_code”, “age”, “in” and “out”, where the latter two should each sum to 1 for each location. By default Rogers-Castro curves are used, obtained via the function `rcastro.schedule`. e0F.file Comma-delimited CSV file with projected female life expectancy. It has the same structure as the file “ascii_trajectories.csv” generated using `bayesLife::convert.e0.trajectories` (which currently works for country-level results only). Required columns are “LocID”, “Year”, “Trajectory”, and “e0”. If `e0F.file` is `NULL`, data from the corresponding wpp package (for `default.country`) is taken, namely the median projections as one trajectory and the low and high variants (if available) as second and third trajectory. Alternatively, this element can be the keyword “median_” in which case only the median is taken. e0M.file Comma-delimited CSV file containing projections of male life expectancy of the same format as `e0F.file`. As in the female case, if `e0M.file` is `NULL`, data for `default.country` from the corresponding wpp package is taken. tfr.file Comma-delimited CSV file with results of total fertility rate (generated using bayesTFR, function `convert.tfr.trajectories`, file “ascii_trajectories.csv”). Required columns are “LocID”, “Year”, “Trajectory”, and “TF”. If this element is not `NULL`, the argument `tfr.sim.dir` is ignored. If both `tfr.file` and `tfr.sim.dir` are `NULL`, data for `default.country` from the corresponding wpp package is taken (median and the low and high variants as three trajectories). Alternatively, this argument can be the keyword “median_” in which case only the wpp median is taken. e0F.sim.dir Simulation directory with results of female life expectancy, generated using `bayesLife::e0.predict.subnat`. It is only used if `e0F.file` is `NULL`. Alternatively, it can be set to the keyword “median_” which has the same effect as when `e0F.file` is “median_”. e0M.sim.dir This is analogous to `e0F.sim.dir`, here for male life expectancy. Use `e0M.file` instead of this item. tfr.sim.dir Simulation directory with projections of total fertility rate (generated using `bayesTFR::tfr.predict.subnat`). It is only used if `tfr.file` is `NULL`. migMtraj, migFtraj, migtraj Comma-delimited CSV file with male/female age-specific migration trajectories, or total migration trajectories (`migtraj`). If present, it replaces deterministic projections given by the `mig*` items. It has a similar format as e.g. `e0M.file` with columns “LocID”, “Year”, “Trajectory”, “Age” (except for `migtraj`) and “Migration”. For a five-year simulation, the “Age” column must have values “0-4”, “5-9”, “10-14”, ..., “95-99”, “100+”. In an annual simulation, age is given by a single number between 0 and 100. migFDMtraj Comma-delimited CSV file with trajectories of in- and out-migration schedules used for the FDM migration method, i.e. if `mig.age.method` is “fdmp” or “fdmnop”. The values have te same meaning as in the `mig.fdm` input item, except that here multiple trajectories of such schedules can be provided. It should have columns “LocID”, “Age”, “Trajectory”, “Value”, and “Parameter”. For “Age”, the same rules apply as for `migMtraj` above. The “Parameter” column should have values “in” for in-migration, “out” for out-migration and “v” for values of the variance denominator `v` used in Equation 22 of Sevcikova et al (2024). For the `v` parameter, the “Age” column should be left empty. GQpopM, GQpopF Age-specific population counts (male and female) that should be excluded from application of the cohort-component method (CCM). It can be used for defining group quarters. These counts are removed from population before the CCM projection and added back afterwards. It is not used when computing vital events on observed data. The datasets should have columns “reg_code”, “age” and “gq”. In such a case the “gq” amount is applied to all years. If it is desired to destinguish the amount that is added back for individual years, the “gq” column should be replaced by columns indicating the individual years, i.e. single years for an annual simulation and time periods (e.g. “2020-2025”, “2025-2030”) for a 5-year simulation. For a five-year simulation, the “age” column should include values “0-4”, “5-9”, “10-14”, ..., “95-99”, “100+”. However, rows with zeros do not need to be included. In an annual simulation, age is given by a single number between 0 and 100. average.annual Character string with values “TFR”, “e0M”, “e0F”. If this is a 5-year simulation, but the inputs of TFR or/and e0 comes from an annual simulation, including the corresponding string here will cause that the TFR or/and e0 trajectories are converted into 5-year averages.
`nr.traj`, `keep.vital.events`, `fixed.mx`, `fixed.pasfr`, `lc.for.all`, `mig.is.rate`, `mig.age.method`, `mig.rc.fam`, `replace.output`, `verbose`	These arguments have the same meaning as in `pop.predict`.
`pasfr.ignore.phase2`	Logical. If `TRUE` the TFR for all locations is considered being in phase III when predicting PASFR.

Details

Population projection for subnational units (regions) is performed by applying the cohort component method to subnational datasets on projected fertility (TFR), mortality and net migration, starting from given sex- and age-specific population counts. The only required inputs are the initial sex- and age-specific population counts in each region (popM and popF elements of the inputs argument) and a file with a set of locations (argument locations). If no other input datasets are given, those datasets are replaced by the corresponding "national" values, taken from the corresponding wpp package. The argument default.country determines the country for those default "national" values. The default country can be also included in the locations file as a record with location.type being set to 0.

The TFR component can be given as a set of trajectories generated using the tfr.predict.subnat function of the bayesTFR package (tfr.sim.dir element). Alternatively, trajectories can be given in an ASCII file (tfr.file).

Similarly, the $e_0$ component can be given as a set of trajectories using the e0.predict.subnat function of the bayesLife package (e0F.sim.dir element). If male projections are generated jointly (i.e. predict.jmale = TRUE), set e0M.sim.dir = "joint_". Alternatively, trajectories can be given in an ASCII files (e0F.file, e0M.file).

Having a set of subnational TFR and $e_0$ trajectories, the cohort component method is applied to each of them to yield a distribution of future subnational population.

Projection of net migration can either be given as disaggregated sex- and age-specific datasets (migM and migF), or as sex totals (migMt and migFt), or as totals (mig), or as sex- and age-specific trajectories (migMtraj and migFtraj), or as total trajectories (migtraj). Alternatively, it can be given as shares between regions as columns in the patterns dataset. These are: inmigrationM_share, inmigrationF_share, outmigrationM_share, outmigrationF_share. The sex specification and/or direction specification (in/out) can be omitted, e.g. it can be simply migration_share. The function extracts the values of net migration projection on the national level and distributes it to regions according to the given shares. For positive (national) values, it uses the in-migration shares; for negative values it uses the out-migration shares. If the in/out prefix is omitted in the column names, the given migartion shares are used for both, positive and negative net migration projection. By default, if no migration datasets neither region-specific shares are given, the distribution between regions is proportional to the size of population. The age-specific schedules follow by default the Rogers-Castro age schedules. Note that when handling migration using shares as described here, it only affects the distribution of international migration into regions. It does not take into account between-region migration.

The package contains example datasets for Canada. Use these as templates for your own data. See Example below.

Value

Object of class bayesPop.prediction containing the subnational projections. Note that this object can be used in the various bayesPop functions exactly the same way as an object with national projections. However, the meaning of the argument country in many of these functions (e.g. in pop.trajectories.plot) changes to an identification of the region (either as a numerical code or name as defined in the locations file).

Acknowledgment

We are greatful to Patrice Dion from Statistics Canada for providing us with example data. Note that the example datasets included in the package are not official STATCAN data - they only serve the purpose of illustration and templates. Data for the time period 2015-2020 has been imputed by the author.

Author(s)

Hana Sevcikova

Examples

## Not run: 
# Subnational projections for Canada
#########
data.dir <- file.path(find.package("bayesPop"), "extdata")

# Use national data for tfr and e0
###
sim.dir <- tempfile()
pred <- pop.predict.subnat(output.dir = sim.dir,
            locations = file.path(data.dir, "CANlocations.txt"),
            inputs = list(popM = file.path(data.dir, "CANpopM.txt"),
                          popF = file.path(data.dir, "CANpopF.txt"),
                          tfr.file = "median_"
                        ),
            verbose = TRUE)
pop.trajectories.plot(pred, "Alberta", sum.over.ages = TRUE)
unlink(sim.dir, recursive=TRUE)

# Use subnational probabilistic TFR simulation
###
# Subnational TFR projections for Canada (from ?tfr.predict.subnat)
my.subtfr.file <- file.path(find.package("bayesTFR"), 'extdata', 'subnational_tfr_template.txt')
tfr.nat.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output")
tfr.reg.dir <- tempfile()
tfr.preds <- tfr.predict.subnat(124, my.tfr.file = my.subtfr.file,
    sim.dir = tfr.nat.dir, output.dir = tfr.reg.dir, start.year = 2013)
 
# Use subnational probabilistic e0
### 
# Subnational e0 projections for Canada (from ?e0.predict.subnat)
# (here using the same female and male data, just for illustration)
my.sube0.file <- file.path(find.package("bayesLife"), 'extdata', 'subnational_e0_template.txt')
e0.nat.dir <- file.path(find.package("bayesLife"), "ex-data", "bayesLife.output")
e0.reg.dir <- tempfile()
e0.preds <- e0.predict.subnat(124, my.e0.file = my.sube0.file,
    sim.dir = e0.nat.dir, output.dir = e0.reg.dir, start.year = 2018,
    predict.jmale = TRUE, my.e0M.file = my.sube0.file)
 
# Population projections
sim.dir <- tempfile()
pred <- pop.predict.subnat(output.dir = sim.dir,
            locations = file.path(data.dir, "CANlocations.txt"),
            inputs = list(popM = file.path(data.dir, "CANpopM.txt"),
                          popF = file.path(data.dir, "CANpopF.txt"),
                          patterns = file.path(data.dir, "CANpatterns.txt"),
                          tfr.sim.dir = file.path(tfr.reg.dir, "subnat", "c124"),
                          e0F.sim.dir = file.path(e0.reg.dir, "subnat_ar1", "c124"),
                          e0M.sim.dir = "joint_"
                        ),
            verbose = TRUE)
pop.trajectories.plot(pred, "Alberta", sum.over.ages = TRUE)
pop.pyramid(pred, "Manitoba", year = 2050)
get.countries.table(pred)

# Aggregate to country level
aggr <- pop.aggregate.subnat(pred, regions = 124, 
            locations = file.path(data.dir, "CANlocations.txt"))
pop.trajectories.plot(aggr, "Canada", sum.over.ages = TRUE)

unlink(sim.dir, recursive = TRUE)
unlink(tfr.reg.dir, recursive = TRUE)
unlink(e0.reg.dir, recursive = TRUE)

## End(Not run)

bayesPop documentation built on April 12, 2025, 1:24 a.m.