pop.aggregate: Aggregation of Population Projections

View source: R/aggregate.R

pop.aggregateR Documentation

Aggregation of Population Projections


Aggregation of existing countries' population projections into projections of given regions, and accessing such aggregations.


pop.aggregate(pop.pred, regions, 
    input.type = c("country", "region"), name = input.type,
    inputs = list(e0F.sim.dir = NULL, e0M.sim.dir = "joint_", tfr.sim.dir = NULL),
    my.location.file = NULL, verbose = FALSE, ...)
get.pop.aggregation(sim.dir = NULL, pop.pred = NULL, name = NULL, 
    write.to.cache = TRUE)
pop.aggregate.subnat(pop.pred, regions, locations, ..., verbose = FALSE)



Object of class bayesPop.prediction containing country-specific population projections.


Vector of numerical codes of regions. It should correspond to values in the column “country_code” in the UNlocations dataset or in my.location.file (see below). For pop.aggregate.subnat it is a numerical code of a country over which subregions are aggregated.


There are two methods for aggregating projections depending on the type of inputs, “country”- and “region”-based, see Details.


Name of the aggregation. It becomes a part of a directory name where aggregation results are stored.


This argument is only used when the “region”-based method is selected. It is a list of inputs of probabilistic components of the projection:


Simulation directory with projections of female life expectancy (generated using bayesLife). It must contain projections for the given regions (see functions run.e0.mcmc.extra, e0.predict.extra). If it is not given, the same e0 directory is taken which was used for generating the pop.pred object, in which case the e0 projections are re-loaded from disk.


Simulation directory with projections of male life expectancy. By default (value NULL or “joint_”) the function assumes a joint female-male projections of life expectancy and thus tries to load the male projections from the female projection object created using the e0F.sim.dir argument.


Simulation directory with projections of total fertility rate (generated using bayesTFR). It must contain projections for the given regions (see functions run.tfr.mcmc.extra, tfr.predict.extra). If it is not given, the same TFR directory is taken which was used for generating the pop.pred object, in which case the TFR projections are re-loaded from disk.


User-defined location file that can contain other agreggation groups than the default UN location file. It should have the same structure as the UNlocations dataset, see below.


Logical switching log messages on and off.


Simulation directory where aggregation is stored. It is the same directory used for creating the pop.pred object. Alternatively, pop.pred can be used. Either sim.dir or pop.pred must be given.


Logical controlling if functions operating on this object are allowed to write into its cache (see Details of get.pop.prediction).


Name of a tab-delimited file that contains definitions of the sub-regions. It should be the same file as used for the locations argument in pop.predict.subnat.


Additional arguments. For a country-type aggregation, it can be logical use.kannisto which determines if the Kannisto method should be used for old ages when aggregating mortality rates. A logical argument keep.vital.events determines if vital events should be computed for aggregations. Argument adjust determines if country-level population numbers should be adjusted to the WPP values.


Function pop.aggregate triggers an aggregations over countries while function pop.aggregate.subnat is used for aggregation over sub-regions to a country. The following details refer to the use of pop.aggregate. For sub-national aggregation see Example in pop.predict.subnat.

The dataset UNlocations or my.location.file is used to determine countries to be aggregated, in particular the field “location_type” of the entries with “country_code” given in the regions argument. One can aggregate over the following location types: Type 0 means aggregating all countries of the world (or in the file), type 2 is aggregating over continents, type 3 is aggregating over regions within continents, and any other integer (except 4) correponds to user-defined aggregations. Note that type 4 is reserved as a location type of countries and thus, all aggregations are performed over entries of this type. For type 2, countries are matched using the “area_code” column; for type 3 the matching is done using the “reg_code” column of the UNlocations dataset. E.g., if regions=908 (Europe) which has location type 2 in the default UNlocations dataset, all countries are aggregated for which values of 908 are found in the “area_code” column. If the location type is other than 0, 2, 3 and 4, there must be a column in the file called “agcode_x” with x being the location type. This column is then used to match the countries to be aggregated.

Consider the following example. Say we want to pair four countries (Germany [DE], France [FR], Netherlands [NL], Italy [IT]) in two different ways, so we have two overlapping groupings, each of which has two groups (A,B):

  1. group A = (DE, FR), group B = (NL, IT)

  2. group A = (DE, NL), group B = (FR, IT)

Then, my.location.file should have the following entries:

country_code name location_type agcode_98 agcode_99
1001 grouping1_groupA 98 -1 -1
1002 grouping1_groupB 98 -1 -1
1003 grouping2_groupA 99 -1 -1
1004 grouping2_groupB 99 -1 -1
276 Germany 4 1001 1003
250 France 4 1001 1004
258 Netherlands 4 1002 1003
380 Italy 4 1002 1004
1005 all 0 -1 -1

The “country_code” of the groups is user-specific, but it must be unique within the file. Values of “country_code” for countries must match those in the prediction object. To run the aggregation for the four groups above we set regions=1001:1004. Having “location_type” being 98 and 99, it is expected the file to have columns “agcode_98” and “agcode_99” containing assignements to each of the two groupings. Values in this columns corresponding to groups are not used and thus can have any value. For aggregating over all four countries, set regions=1005 which has “location_type” equal 0 and thus, it is aggregated over all entries with “location_type” equals 4.

There are two methods available for generating aggregations of population projection:

Country-based Method

Aggregations are created by summing trajectories over countries of the given region.

Region-based Method

The aggregation is generated using the same algorithm as population projections for single countries (function pop.predict), but it operates on aggregated input components. These are created as follows. Here c denotes countries over which we aggregate a region R, s \in \{m, f\}, a, and t denote sex, age category and time, respectively. t=P denotes the present year of the prediction. N_{s,a,t}^c and M_{s,a,t}^c, respectively, denotes the historical population count and the Bayesian predictive median of population, respectively, of sex s, in age category a at time t for country c (refer to the links in parentheses for description of the data):

Initial sex and age-specific population (popM, popF):

N_{s,a,t=P}^R = \sum_c N_{s,a,t=P}^c

Sex and age-specific death rates (mxM, mxF):

mx_{s,a,t}^R = \frac{\sum_c(mx_{s,a,t}^c \cdot N_{s,a,t})}{\sum_c N_{s,a,t}}

Sex ratio at birth (srb):

SRB_t^R = \frac{\sum_c M_{s=m,a=1,t}^c}{\sum_c M_{s=f,a=1,t}^c}

Percentage age-specific fertility rate (pasfr):

PASFR_{a,t}^R = \frac{\sum_c(PASFR_{a,t}^c \cdot M_{s=f,a,t})}{\sum_c M_{s=f,a,t}}

Migration code and start year (mig.type):

Aggregated migration code is the code of maximum counts over aggregated countries weighted by N_{t=P}^c. Migration start year is the maximum of start years over aggregated countries.

Sex and age-specific migration (migM, migF):

mig_{s,a,t}^R = \sum_c mig_{s,a,t}^c

Probabilistic projection of life expectancy:

We assume an aggregation of life expectancy for the given regions was generated prior to this call, using the run.e0.mcmc.extra and e0.predict.extra functions of the bayesLife package.

Probabilistic projection of total fertility rate:

We assume an aggregation of total fertility for the given regions was generated prior to this call, using the run.tfr.mcmc.extra and tfr.predict.extra functions of the bayesTFR package.

Results of the aggregations are stored in the same top directory as the pop.pred object, in a sudirectory called ‘aggregations_name’. They can be accessed using the function get.pop.aggregation. Note that multiple runs of this function with the same name will overwrite previous aggregations results of the same name.


Object of class bayesPop.prediction containing the aggregated results. In addition it contains elements aggregation.method giving the input.type used, and aggregated.countries which is a list of countries aggregated for each region.


Hana Sevcikova, Adrian Raftery


H. Sevcikova, A. E. Raftery (2016). bayesPop: Probabilistic Population Projections. Journal of Statistical Software, 75(5), 1-29. doi:10.18637/jss.v075.i05

See Also

pop.predict, tfr.predict.extra, e0.predict.extra


## Not run: 
sim.dir <- tempfile()
pred <- pop.predict(countries=c(528,218,450), output.dir=sim.dir)
aggr <- pop.aggregate(pred, 900) # aggregating World (i.e. all countries available in pred)
pop.trajectories.plot(aggr, 900, sum.over.ages=TRUE)
# countries over which we aggregated:
subset(UNlocations, country_code %in% aggr$aggregated.countries[["900"]])
unlink(sim.dir, recursive=TRUE)
## End(Not run)

bayesPop documentation built on Aug. 10, 2023, 1:10 a.m.