pop.predict.subnat: Subnational Probabilistic Population Projection

View source: R/predict_subnat.R

pop.predict.subnatR Documentation

Subnational Probabilistic Population Projection


Generates trajectories of probabilistic population projection for subregions of a given country.


pop.predict.subnat(end.year = 2060, start.year = 1950, present.year = 2020, 
        wpp.year = 2019, output.dir = file.path(getwd(), "bayesPop.output"), 
        locations = NULL, default.country = NULL, annual = FALSE,
        inputs = list(
            popM = NULL, popF = NULL, 
            mxM = NULL, mxF = NULL, srb = NULL, 
            pasfr = NULL, patterns = NULL, 
            migM = NULL, migF = NULL, 
            migMt = NULL, migFt = NULL, mig = NULL,
            e0F.file = NULL, e0M.file = NULL, tfr.file = NULL, 
            e0F.sim.dir = NULL, e0M.sim.dir = NULL, tfr.sim.dir = NULL, 
            migMtraj = NULL, migFtraj = NULL, migtraj = NULL,
            GQpopM = NULL, GQpopF = NULL, average.annual = NULL
        nr.traj = 1000, keep.vital.events = FALSE, 
        fixed.mx = FALSE, fixed.pasfr = FALSE, lc.for.all = TRUE,
         mig.is.rate = FALSE, replace.output = FALSE, verbose = TRUE)



End year of the projection.


First year of the historical data on mortality rates. It determines the length of the historical time series used in the Lee-Carter estimation.


Year for which initial population data is to be used.


Year for which WPP data is used. The function loads a package called wppx where x is the wpp.year and uses its data (corresponding to the default.country) as default datasets if region-specific alternatives are not given (see more details below).


Output directory of the projection.


Name of a tab-delimited file that contains definitions of the subregions. It has a similar structure as UNlocations, with mandatory columns reg_code (unique identifier of the subregions) and name (name of the subregions). Optionally, location_type should be set to 4 for subregions to be processed. Column country_code can be included with the numerical code of the corresponding country. A row with location_type of 0 determines the country that the subregions belong to and is used for extracting default "national" datasets if the argument default.country is missing. In such a case, the code of the default country is taken from its column country_code. This is a mandatory argument.


Numerical code of a country to which the subregions belong to. It is used for extracting default datasets from the wpp package if some region-specific input datasets are missing. Alternatively, it can be also included in the locations file, see above. In either case, the code must exists in the UNlocations dataset.


Logical. If TRUE it is assumed that this is 1x1 simulation, i.e. one year age groups and one year time periods.


A list of file names where input data is stored. Unless otherwise noted, these are tab delimited ASCII files with a mandatory column reg_code giving the numerical identifier of the subregions. If an element of this list is NULL, usually a default dataset corresponding to default.country is extracted from the wpp package. Names of these default datasets are shown in brackets. This list contains the following elements:

popM, popF

Initial male/female age-specific population (at time present.year). Mandatory items, no defaults. Must contain columns reg_code and age and be of the same structure as popM from wpp.

mxM, mxF

Historical data and (optionally) projections of male/female age-specific death rates [mxM, mxF] (see also argument fixed.mx).


Projection of sex ratio at birth. [sexRatio]


Historical data and (optionally) projections of percentage age-specific fertility rate [percentASFR] (see also argument fixed.pasfr).


Information on region's specifics regarding migration type, base year of the migration, mortality and fertility age patterns as defined in [vwBaseYear]. In addition, it can contain columns defining migration shares between the subregions, see Details below.

migM, migF, migMt, migFt, mig

Projection and (optionally) historical data of net migration on the same scale as the initital population. There are three ways of defining this quantity, here in order of priority: 1. via migM and migF which should give male and female age-specific migration [migrationM, migrationF]; 2. via migMt and migFt which should give male and female total net migration; 3. via mig which should give the total net migration. For 2. and 3., the totals are disagregated into age-specific migration by applying a Rogers-Castro schedule. For 3., the totals are equally split between sexes. If all of these input items are missing, the migration schedules are constructed from total migration counts of the default.country derived from migration using Rogers Castro for age distribution. Migration shares between subregions (including sex-specific shares) can be given in the patterns file, see above and Details below. If no shares are given, it is distributed by population shares.


Comma-delimited CSV file with projected female life expectancy. It has the same structure as the file “ascii_trajectories.csv” generated using bayesLife::convert.e0.trajectories (which currently works for country-level results only). Required columns are “LocID”, “Year”, “Trajectory”, and “e0”. If e0F.file is NULL, data from the corresponding wpp package (for default.country) is taken, namely the median projections as one trajectory and the low and high variants (if available) as second and third trajectory. Alternatively, this element can be the keyword “median_” in which case only the median is taken.


Comma-delimited CSV file containing projections of male life expectancy of the same format as e0F.file. As in the female case, if e0M.file is NULL, data for default.country from the corresponding wpp package is taken.


Comma-delimited CSV file with results of total fertility rate (generated using bayesTFR, function convert.tfr.trajectories, file “ascii_trajectories.csv”). Required columns are “LocID”, “Year”, “Trajectory”, and “TF”. If this element is not NULL, the argument tfr.sim.dir is ignored. If both tfr.file and tfr.sim.dir are NULL, data for default.country from the corresponding wpp package is taken (median and the low and high variants as three trajectories). Alternatively, this argument can be the keyword “median_” in which case only the wpp median is taken.


Simulation directory with results of female life expectancy, generated using bayesLife::e0.predict.subnat. It is only used if e0F.file is NULL. Alternatively, it can be set to the keyword “median_” which has the same effect as when e0F.file is “median_”.


This is analogous to e0F.sim.dir, here for male life expectancy. Use e0M.file instead of this item.


Simulation directory with projections of total fertility rate (generated using bayesTFR::tfr.predict.subnat). It is only used if tfr.file is NULL.

migMtraj, migFtraj, migtraj

Comma-delimited CSV file with male/female age-specific migration trajectories, or total migration trajectories (migtraj). If present, it replaces deterministic projections given by the mig* items. It has a similar format as e.g. e0M.file with columns “LocID”, “Year”, “Trajectory”, “Age” (except for migtraj) and “Migration”. For a five-year simulation, the “Age” column must have values “0-4”, “5-9”, “10-14”, ..., “95-99”, “100+”. In an annual simulation, age is given by a single number between 0 and 100.

GQpopM, GQpopF

Age-specific population counts (male and female) that should be excluded from application of the cohort-component method (CCM). It can be used for defining group quarters. These counts are removed from population before the CCM projection and added back afterwards. It is not used when computing vital events on observed data. The datasets should have columns “reg_code”, “age” and “gq”. For a five-year simulation, the “age” column should include values “0-4”, “5-9”, “10-14”, ..., “95-99”, “100+”. However, rows with zeros do not need to be included. In an annual simulation, age is given by a single number between 0 and 100.


Character string with values “TFR”, “e0M”, “e0F”. If this is a 5-year simulation, but the inputs of TFR or/and e0 comes from an annual simulation, including the corresponding string here will cause that the TFR or/and e0 trajectories are converted into 5-year averages.

nr.traj, keep.vital.events, fixed.mx, fixed.pasfr, lc.for.all, mig.is.rate, replace.output, verbose

These arguments have the same meaning as in pop.predict.


Population projection for subnational units (regions) is performed by applying the cohort component method to subnational datasets on projected fertility (TFR), mortality and net migration, starting from given sex- and age-specific population counts. The only required inputs are the initial sex- and age-specific population counts in each region (popM and popF elements of the inputs argument) and a file with a set of locations (argument locations). If no other input datasets are given, those datasets are replaced by the corresponding "national" values, taken from the corresponding wpp package. The argument default.country determines the country for those default "national" values. The default country can be also included in the locations file as a record with location.type being set to 0.

The TFR component can be given as a set of trajectories generated using the tfr.predict.subnat function of the bayesTFR package (tfr.sim.dir element). Alternatively, trajectories can be given in an ASCII file (tfr.file).

Similarly, the $e_0$ component can be given as a set of trajectories using the e0.predict.subnat function of the bayesLife package (e0F.sim.dir element). If male projections are generated jointly (i.e. predict.jmale = TRUE), set e0M.sim.dir = "joint_". Alternatively, trajectories can be given in an ASCII files (e0F.file, e0M.file).

Having a set of subnational TFR and $e_0$ trajectories, the cohort component method is applied to each of them to yield a distribution of future subnational population.

Projection of net migration can either be given as disaggregated sex- and age-specific datasets (migM and migF), or as sex totals (migMt and migFt), or as totals (mig), or as sex- and age-specific trajectories (migMtraj and migFtraj), or as total trajectories (migtraj). Alternatively, it can be given as shares between regions as columns in the patterns dataset. These are: inmigrationM_share, inmigrationF_share, outmigrationM_share, outmigrationF_share. The sex specification and/or direction specification (in/out) can be omitted, e.g. it can be simply migration_share. The function extracts the values of net migration projection on the national level and distributes it to regions according to the given shares. For positive (national) values, it uses the in-migration shares; for negative values it uses the out-migration shares. If the in/out prefix is omitted in the column names, the given migartion shares are used for both, positive and negative net migration projection. By default, if no migration datasets neither region-specific shares are given, the distribution between regions is proportional to the size of population. The age-specific schedules follow by default the Rogers-Castro age schedules. Note that when handling migration using shares as described here, it only affects the distribution of international migration into regions. It does not take into account between-region migration.

The package contains example datasets for Canada. Use these as templates for your own data. See Example below.


Object of class bayesPop.prediction containing the subnational projections. Note that this object can be used in the various bayesPop functions exactly the same way as an object with national projections. However, the meaning of the argument country in many of these functions (e.g. in pop.trajectories.plot) changes to an identification of the region (either as a numerical code or name as defined in the locations file).


We are greatful to Patrice Dion from Statistics Canada for providing us with example data. Note that the example datasets included in the package are not official STATCAN data - they only serve the purpose of illustration and templates. Data for the time period 2015-2020 has been imputed by the author.


Hana Sevcikova

See Also

pop.predict, tfr.predict.subnat, pop.aggregate.subnat


## Not run: 
# Subnational projections for Canada
data.dir <- file.path(find.package("bayesPop"), "extdata")

# Use national data for tfr and e0
sim.dir <- tempfile()
pred <- pop.predict.subnat(output.dir = sim.dir,
            locations = file.path(data.dir, "CANlocations.txt"),
            inputs = list(popM = file.path(data.dir, "CANpopM.txt"),
                          popF = file.path(data.dir, "CANpopF.txt"),
                          tfr.file = "median_"
            verbose = TRUE)
pop.trajectories.plot(pred, "Alberta", sum.over.ages = TRUE)
unlink(sim.dir, recursive=TRUE)

# Use subnational probabilistic TFR simulation
# Subnational TFR projections for Canada (from ?tfr.predict.subnat)
my.subtfr.file <- file.path(find.package("bayesTFR"), 'extdata', 'subnational_tfr_template.txt')
tfr.nat.dir <- file.path(find.package("bayesTFR"), "ex-data", "bayesTFR.output")
tfr.reg.dir <- tempfile()
tfr.preds <- tfr.predict.subnat(124, my.tfr.file = my.subtfr.file,
    sim.dir = tfr.nat.dir, output.dir = tfr.reg.dir, start.year = 2013)
# Use subnational probabilistic e0
# Subnational e0 projections for Canada (from ?e0.predict.subnat)
# (here using the same female and male data, just for illustration)
my.sube0.file <- file.path(find.package("bayesLife"), 'extdata', 'subnational_e0_template.txt')
e0.nat.dir <- file.path(find.package("bayesLife"), "ex-data", "bayesLife.output")
e0.reg.dir <- tempfile()
e0.preds <- e0.predict.subnat(124, my.e0.file = my.sube0.file,
    sim.dir = e0.nat.dir, output.dir = e0.reg.dir, start.year = 2018,
    predict.jmale = TRUE, my.e0M.file = my.sube0.file)
# Population projections
sim.dir <- tempfile()
pred <- pop.predict.subnat(output.dir = sim.dir,
            locations = file.path(data.dir, "CANlocations.txt"),
            inputs = list(popM = file.path(data.dir, "CANpopM.txt"),
                          popF = file.path(data.dir, "CANpopF.txt"),
                          patterns = file.path(data.dir, "CANpatterns.txt"),
                          tfr.sim.dir = file.path(tfr.reg.dir, "subnat", "c124"),
                          e0F.sim.dir = file.path(e0.reg.dir, "subnat_ar1", "c124"),
                          e0M.sim.dir = "joint_"
            verbose = TRUE)
pop.trajectories.plot(pred, "Alberta", sum.over.ages = TRUE)
pop.pyramid(pred, "Manitoba", year = 2050)

# Aggregate to country level
aggr <- pop.aggregate.subnat(pred, regions = 124, 
            locations = file.path(data.dir, "CANlocations.txt"))
pop.trajectories.plot(aggr, "Canada", sum.over.ages = TRUE)

unlink(sim.dir, recursive = TRUE)
unlink(tfr.reg.dir, recursive = TRUE)
unlink(e0.reg.dir, recursive = TRUE)

## End(Not run)

bayesPop documentation built on Aug. 10, 2023, 1:10 a.m.