DemographicArray-class: Classes "DemographicArray", "Counts", and "Values".
In StatisticsNZ/dembase: Analysing Cross-Classified Data about Populations

Description Details Objects from the class Automating reshaping Methods for existing functions See Also Examples

Classes for representing demographic arrays: esssentially, arrays plus metadata.

DemographicArray is a virtual superclass, and Counts and Values its two main subclasses. For a discussion of what these terms mean and of R's class system see Classes. However, to use package dembase, it is probably enough to know that the phrase 'objects of classDemographicArray' is shorthand for 'objects of any class that is a subclass of DemographicArray'. A list of the subclasses of DemographicArray can be obtained using getClass(DemographicArray).

Objects of class DemographicArray are arrays with some specialized metadata that are useful when dealing with population data. For instance, all objects of class DemographicArray have dimtypes and dimscales describing the type of variable being measured and the measurement scale. Objects of class DemographicArray also have some specialized behaviours that arrays do not. For instance, when two objects of class DemographicArray are added together, the dimensions of the two objects are automatically aligned.

Objects of class Counts hold data about numbers of people or events, while objects of class Values hold information about characteristics or attributes. Some functions, such as ones that aggregate cells, treat objects of class "Counts" differently from objects of class "Values".

Unlike ordinary arrays, objects of class DemographicArray must have a complete set of dimnames, meaning that each dimension must be named, and within a dimension the labels must be unique.

Objects of class Counts and Values are generated using functions Counts and Values. Because DemographicArray is a virtual class, no objects may be created from it.

When demographic arrays are used in arithmetic, or are supplied to a function, one or more of the objects will attempt to reshape themselves so that the objects are compatible. The reshaping involves the following operations:

Permuting dimensions: Dimensions are rearranged so that they follow the same order.
Adding dimensions: If an object of class "Values" lacks a dimension that others have, the missing dimension is added to that object.
Collapsing dimensions: If an object of class "Counts" has a dimension that others lack, the extra dimension is collapsed away.
Permuting categories: Categories within each dimension are rearranged so that they follow the same order.
Splitting intervals: If an object of class "Values" uses coarser intervals than other objects, the coarser intervals are split. Cells within the new intervals have the same values as cells within the old combined interval.
Collapsing intervals: If an object of class "Counts" uses finer intervals than other objects, the finer intervals are collapsed.
Subsetting: If on object contains categories that another object does not, the extra categories are typically dropped.

If these operations are not sufficient to align objects, then an error is raised. In particular, an error will be raised if the only way to align objects is to remove cells.

The rules for adding dimensions to objects of class "Values", and for splitting intervals within objects of class "Values", assume that, within each cell of the original classification, every person or event is identical. These sorts of homogeneity assumptions are standard in applied demography. The assumptions are more plausible when more categories are dimensions are used. Homogeneity assumptions can be avoided by adding dimensions or splitting intervals 'by hand' with functions such as addDimension.

When there is a mixture of "Counts" and "Value" objects, there is often a choice collapsing the "Counts" objects and splitting or adding to the "Values" objects. The default it to split and add to the "Values" objects, as this preserves all the original detail while giving the same subtotals.

A function that was designed to work with ordinary arrays will generally gives an equivalent result when used with a demographic array. For instance, if a is an array, then sum(a) equals sum(Counts(a)).

Some methods for demographic arrays include options not available for ordinary arrays. See, for instance, as.data.frame and names.

In some cases, copying the behaviour of ordinary arrays would require breaking the rules governing dimension names, dimtypes, and dimscales discussed in dimtypes. See, for instance, drop.

Function names returns NULL when used with an ordinary array, but returns the names of the dimensions when used with a demographic array.

Counts, Values, dimtypes dimscales. The main new functions for manipulating demographic arrays are listed in dembase.

a <- array(stats::rpois(n = 6, lambda = 10),
          dim = c(3, 2),
          dimnames = list(age = c("0-19", "20-64", "65+"),
              sex = c("Female", "Male")))
x <- Counts(a)
x
plot(x)
x^2
mean(x)
names(x)
collapseDimension(x, dimension = "age")

b <- array(rnorm(n = 6),
           dim = c(2, 3),
          dimnames = list(sex = c("Male", "Female"),
               age = c("0-19", "20-64", "65+")))
y <- Values(b)
y
## 'y' is automatically reshaped to align to 'x'
x * y
## weights are required with objects of class "Values"
collapseDimension(y, dimension = "age", weights = x)