expressions: Expressions as used in Population Output Functions
In PPgp/bayesPop: Probabilistic Population Projection

pop.expressions

R Documentation

Expressions as used in Population Output Functions

Description

Documentation of expressions supported by functions pop.trajectories.plot, pop.trajectories.plotAll, pop.trajectories.table, pop.byage.plot, pop.byage.table, cohorts, pop.cohorts.plot, pop.map, pop.map.gvis, write.pop.projection.summary, get.pop.ex, get.pop.exba.

Details

The functions above accept an argument expression which should define a population measure, i.e. a quantity that can be computed from population projections, observed population data or vital events. Such an expression is a collection of basic components connected via usual arithmetic operators, such as +, -, *, /, ^, %%, %/%, and combined using parentheses. In addition, standard R functions or predefined functions (see below) can be used within expressions.

A basic component is a character string constituted of four parts, two of which are optional. They must be in the following order:

Measure identification. One of the folowing upper-case characters:
- ‘P’ - population,
- ‘D’ - deaths,
- ‘B’ - births,
- ‘S’ - survival ratio,
- ‘F’ - fertility rate,
- ‘R’ - percent age-specific fertility,
- ‘M’ - mortality rate,
- ‘Q’ - probability of dying,
- ‘E’ - life expectancy,
- ‘G’ - net migration,
- ‘A’ - a_x column of the life table.
All but the ‘P’ and ‘G’ indicators are available only if the pop.predict function was run with keep.vital.events=TRUE.
Country part. One of the following:
- Numerical country code (as used in UNlocations, see https://en.wikipedia.org/wiki/ISO_3166-1_numeric),
- two- or three-character ISO 3166 code, see https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2, https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3,
- characters “XXX” which serves as a wildcard for a country code.
Sex part (optional): The country part can be followed by either “_F” (for female) or “_M” (for male).
Age part (optional): If used, the basic component is concluded by an age index given as an array. Such array is embraced by either brackets (“[” and “]”) or curly braces (“{” and “}”). The former invokes a summation of counts over given ages, the latter is used when no summation is desired. Note that if this part is missing, counts are automatically summed over all ages. To use all ages without summing, empty curly braces can be used.
- For 5x5 predictions, the age index 1 corresponds to age 0-4, index 2 corresponds to age 5-9 etc. Indicators ‘S’, ‘M’, ‘Q’ and ‘E’ allow an index -1 which corresponds to age 0-1 and an index 0 which corresponds to age 1-4. Use the pre-defined functions age.index01(...) and age.index05(...) (see below) to define the right indices.
- For 1x1 predictions, the age index starts with 0 for all indicators and matches exactly the age. I.e., indices 0,1,2,... correspond to ages 0,1,2,....

Not all combinations of the four parts above make sense. For example, ‘F’ and ‘R’ can be only combined with female sex, ‘B’, ‘F’ and ‘R’ can be only combined with a subset of the age groups, namely child-bearing ages (indices 4 to 10 in 5x5, or 11 to 55 in 1x1). Or, there is no point in summing the life table based indicators (M, Q, E, S, A) over multiple age groups, i.e. using brackets, or over sexes. Thus, if the sex part is omitted for the life table indicators, the life table is correctly aggregated over sexes, instead of a simple summation.

Examples of basic components are “P276”, “D50_F[4:10]”, “PXXX{14:27}”, “SCZE_M{}”, “QIE_M[-1]”.

When the expression is evaluated on a prediction object, each basic component is substituted by an array of four dimensions (using the get.pop function):

Country dimension: Equals to one if a specific country code is given, or it equals the number of countries in the prediction object if a wildcard is used.
Age dimension: Equals to one if the third component above is missing or the age is defined within square brackets. If the age is defined within curly braces, this dimension corresponds to the length of the age array.
Time dimension: Depending on the time context of the expression, this dimension corresponds to either the number of projection periods or the number of observation periods.
Trajectory dimension: Corresponds to the number of trajectories in the prediction object, or one if the component is evaluated on observed data.

Depending on the context from which the expression is called, the trajectory dimension of the result of the expression can be reduced by computing given quantiles, and if only one country is evaluated, the first dimension is removed. In addition, with an exception of functions pop.byage.plot, pop.byage.table, cohorts, and pop.cohorts.plot, the expression should be constructed in a way that the age dimension is eliminated. This can be done for example by using brackets to define age, by using the apply function or one of the pre-defined functions described below. When using within pop.byage.plot, pop.byage.table, cohorts, or pop.cohorts.plot, the expression MUST include curly braces.

While get.pop can be used to obtain results of a basic component, functions get.pop.ex and get.pop.exba evaluate whole expressions.

Pre-defined functions

The following functions can be used within an expression:

gmedian(f, cat)
It gives a median for grouped data with frequencies f and categories cat. This function is to be used in combination with apply or pop.apply (see below) along the age dimension. For example,
“apply(P380{}, c(1,3,4), gmedian, cats=seq(0, by=5, length=28))”
is an expression for median age in Italy. (See pop.apply below for a simplified version.)
gmean(f, cat)
Works like gmedian but gives the grouped mean.
age.func(data, fun="*")
This function applies fun to data and the corresponding age (the middle point of each age category). The default case would multiply data by the corresponding age. As gmedian, it is to be used in combination with apply or pop.apply.
drop.age(data)
Drops the age dimension of the data. For example, if two basic components are combined where one is used within the apply function, the other will need to change its dimension in order to have conformable arrays. For example,
“apply(age.func(P752{}), c(1,3,4), sum) / drop.age(P752)”
is an expression for the average age in Sweden. (See pop.apply below for a simplified version.)
pop.apply(data, fun, ..., split.along=c("None", "age", "traj", "country"))
By default applies function fun to the age dimension of data and converts the result into the same format as returned by a basic component. This allows combining the apply function with other basic components without having to modify their dimensions. For example,
“pop.apply(age.func(P752{}), fun=sum) / P752” gives the average age in Sweden, or
“pop.apply(P380{}, gmedian, cats=seq(0, by=5, length=28))” gives the median age of Italy. If slice.along is not ‘None’, it can be used as an apply function where the data is sliced along one axis.
pop.combine(data1, data2, fun, ..., split.along=c("age", "traj", "country"))
Can be used if two basic components should be combined that result in different shapes. It tries to put data into the right format and calls pop.apply. For example,
“pop.combine(PIND{}, PIND, '/')” give population by age per total population in India, or
“pop.combine(BFR - DFR, GFR, '+', split.along='traj')” gives births minus deaths plus net migration in France. Here, pop.combine is necessary, because ‘GFR’ is a deterministic component and thus, has only one trajectory, whereas births and deaths are probabilistic.
age.index01(end)
Can be used with indicators ‘S’, ‘M’, ‘Q’ and ‘E’ only. It returns an array of age group indices that include ages 0-1 and 1-4 and exclude 0-4. The last age index is end.
age.index05(end)
Returns an array of age group indices starting with group 0-4, 5-9 until the age group corresponding to index end.

There is also a help function available that generates an expression for the mean age of childbearing, see mac.expression.

Note

The expression parser is simple and far from being perfect. We recommend to leave spaces around the basic components.

Author(s)

Hana Sevcikova, Adrian Raftery

References

H. Sevcikova, A. E. Raftery (2016). bayesPop: Probabilistic Population Projections. Journal of Statistical Software, 75(5), 1-29. doi:10.18637/jss.v075.i05

Examples

sim.dir <- file.path(find.package("bayesPop"), "ex-data", "Pop")
pred <- get.pop.prediction(sim.dir, write.to.cache=FALSE)

# median age of women in child-bearing ages in Netherlands and all countries - trajectories
pop.trajectories.plot(pred, nr.traj=0,
    expression="pop.apply(P528_F{4:10}, gmedian, cats= seq(15, by=5, length=8))")
## Not run: 
pop.trajectories.plotAll(pred, nr.traj=0, 
    expression="pop.apply(PXXX_F{4:10}, gmedian, cats= seq(15, by=5, length=8))")

## End(Not run)
# mean age of women in child-bearing ages in Netherlands - table
pop.trajectories.table(pred, 
    expression="pop.apply(age.func(P528_F{4:10}), fun=sum) / P528_F[4:10]")
# - gives the same results as with "pop.apply(P528_F{4:10}, gmean, cats=seq(15, by=5, length=8))"
# - for the mean age of childbearing, see ?mac.expression

# migration per capita by age
pop.byage.plot(pred, expression="GNL{} / PNL{}", year=2000)

## Not run: 
# potential support ratio - map (with the two countries
#       contained in pred object)
pop.map(pred, expression="PXXX[5:13] / PXXX[14:27]")
## End(Not run)

# proportion of 0-4 years old to whole population - export to an ASCII file
dir <- tempfile()
write.pop.projection.summary(pred, expression="PXXX[1] / PXXX", output.dir=dir)
unlink(dir)

## Not run: 
# These are vital events only available if keep.vital.events=TRUE in pop.predict, e.g.
# sim.dir.tmp <- tempfile()
# pred <- pop.predict(countries="Netherlands", nr.traj=3, 
#           				keep.vital.events=TRUE, output.dir=sim.dir.tmp)
# log female mortality rate by age for Netherlands in 2050, including 0-1 and 1-4 age groups
pop.byage.plot(pred, expression="log(MNL_F{age.index01(27)})", year=2050)

# trajectories of male 1q0 and table of 5q0 for Netherlands
pop.trajectories.plot(pred, expression="QNLD_M[-1]")
pop.trajectories.table(pred, expression="QNLD_M[1]")
# unlink(sim.dir.tmp)
## End(Not run)

PPgp/bayesPop documentation built on June 9, 2025, 12:37 p.m.