dbutils: dbutils: A package of utility functions for processing...
In bcgov/dbutils: Utility functions to process and modify BC Stats' population databases

dbutils

R Documentation

dbutils: A package of utility functions for processing population data

Description

The dbutils package is a set of utility functions for processing and modifying BC Stats' population data via four main categories:

database access functions (getDBPath, dbCheck, dbInfo, dbRead, dbWrite)
conversion functions (conversionTables, conversionRead, dbConvert)
raking functions (dbRake, raking helper functions)
app function (updateAppData to update app data)

About

The BC Stats Population section has historically depended on APL; the 'dbutils' package is a conversion of PEOPLE's APL functions to R functions.

There are two sets of databases: estimates and projections. The naming convention in R of the database file differs from APL. First, estimates are kept separate from projections. This is represented in the naming of the databases as POPRR[E/P]YY, where an "E" is for estimates, and "P" is for projections (APL used "O" for projections). Second, APL databases used the year of the RUN for the file name (e.g., POPHAO19 was ran in 2019). This can be confusing, so the new database files use the years of the most recent estimates. In other words, POPRRE18 contains the estimates up to 2018 (July 1st 2018), and POPRRP18 includes projections data with base year 2018 (July 1st 2018). Projections files also include estimates years.

Database types include: population, deaths, births. Databases must have 7 columns: Year, Type, TypeID, Age, Male, Female, and Total. Each Year of data must have the same number of occurrences, and Years must be continuous (no year can be missing).

There are several naming conventions used with such data, where files are named DDDRRTYY:

DDD is the database type, such as POP = Population, BIR = Birth, DEA = Death.
RR is the region code (see below)
T is the data type (i.e., E = Estimates or P = Projections)
YY is the year (or two-digit PEOPLE version number, indicating which PEOPLE run) you want data for

For historical reasons, population data is saved as POPRREYY (population estimates), POPRRPYY (population projections), BIRRRYY (births) or DEARRYY (deaths), where RR is the shorthand for the region code, and YY is the last two digits of the year (this was passed over from decades of BC Stats PEOPLE development limited to 2-digit codes).

Region codes

Common region codes are as follows:

RD - Regional District (which is the same data as Census Division (CD))
CS/CSD - Census Subdivision (CDCSD)
DR - Development Region (DR)
HA - Local Heath Area (LHA)
HS - Health Service Delivery Area (HSDA)
HY - Health Authority (HA)
CH - Community Health Service Area (CHSA)
CF - Ministry of Children and Family Development (MCFD)
CA - MCFD Service Delivery Area (MCFD_SDA)
CL - MCFD Local Service Area (MCFD_LSA)
MP/RCMP - Police Respondent Areas (RESP)
ED/PED - Provincial Electoral District (PED)
SD - School District (SD)
PS - College Region or Post-Secondary (CR)
SR - Special Regions (includes CMA/CA, Van Isle (VI) and Van Isle excluding CRD (VN). Does not add up to BC total)

Age vectors

Age vectors can include combinations of single age years, 5-year age groups (e.g., -4, -9, etc.), grouped older populations (e.g., -65, -90, -100, -110, -120, etc.), and Total (i.e., -999). Negative ages refer to age groups: those that end in 4 or 9 are five year age groupings ending on that age (e.g., -4 refers to those aged 0-4, -9 is those aged 5-9, -14 is those aged 10-14, etc.), while those that end in 0 or 5 are "and over" groups beginning with that age (e.g., -90 refers to those aged 90 and over, -100 is those aged 100 and over, etc.).

Population data is available in single age years up to 109 years old, but is usually grouped for 100 and over. (APL did not produce population for ages beyond 100+, so 100:109 and -110 have 0 population, although they do not have to be in future.) Death data is available in single age years up to 120 years old. Birth data is available only for ages 15 to 65 and over. (Age in birth data refers to the mother's age.) Population estimates and projections apps only need total and individual ages 0:89, and 90+ (i.e., exclude ages above 90 and any 5 year age groups) as the app creates age groups on the fly.

popall <- c(seq(0, 109), seq(-4, -109, -5), -90, -100, -110, -999)
pop1yr90 <- c(seq(0, 89), -90, -999)
pop1yr100 <- c(seq(0, 99), -100, -999)
pop1yrOver90 <- c(seq(90, 120))
pop5yr90 <- c(seq(-4, -89, -5), -90, -999)
pop5yr100 <- c(seq(-4, -99, -5), -100, -999)
deaall <- c(seq(0, 119), seq(-4, -119, -5), -90, -120, -999)
dea1yr120 <- c(seq(0, 119), -120, -999)
dea1yr110 <- c(seq(0, 109), -110, -999)
dea1yr90 <- c(seq(0, 89), -90, -999)
dea5yr120 <- c(seq(-4, -119, -5), -120, -999)
dea5yr110 <- c(seq(-4, -109, -5), -110, -999)
dea5yr90 <- c(seq(-4, -89, -5), -90, -999)
birall <- c(seq(15, 64), seq(-19, -64, -5), -65, -999)
bir1yr65 <- c(seq(15, 64), -65, -999)
bir5yr65 <- c(seq(-19, -64, -5), -65, -999)

Database conversion

Data is available at varying levels of detail, mostly dependent on some form of geography. For example, Census-related data may be accessible for every Dissemination Block, while health-related data may only be available at the Local Health Area (LHA) level. As such, there exists a need to transfer information from one geography to another. This is achieved using a conversion table that converts (demographic) data from Geography A to Geography B.

A conversion table is based on population counts at a very high level of detail. BC Stats uses Dissemination Block-level population totals from the most recent Census. Additionally, one would need geographic software (like ArcMap or QGIS) to establish the allocation of Dissemination Blocks to the conversion geographies. While the following mentions the underlying linkage between geographies being established via Dissemination Blocks, one could technically use any (small) geography to establish this relationship.

Essentially, the following is required:

(Total) population at the Dissemination Block-level
Dissemination Block allocation to Geography A (source)
Dissemination Block allocation to Geography B (destination)

Item (1) will provide the fundamental relationship between (2) and (3).

First, the Dissemination Block populations are aggregated by all possible combinations of intersections between Geography A and B. This is achievable by creating a table from four columns containing (a) the Dissemination Block number, (b) the Dissemination Block population, (c) Geography A's alignment and (d) Geography B's alignment with the Dissemination Block. Then, percentages are assigned that link Geography A to B.

This approach provides a "match" (or relationship) between Geography A and B that is based on the most recent Census population (in year X, say). Use of the conversion table therefore relies on the assumption that, if converting information from A to B in year Y != X, the conversion relationship holds for year Y. Essentially, this assumes no change in the match between A and B from X to Y. This may be valid (or sufficient) in most cases.

The setup requires a "Dissemination Block allocation to Geography A and B". This allocation assumes a 100 (or B) will have to be completely allocated to one region or the other. Another option may be to allocate its population in some proportion to each of the straddled regions. One such proportional allocation could be area. For example, assume Dissemination Block X with population 100 straddles Region Y and Z such that 40 allocate 40 persons from X to Y and 60 persons from X to Z.

Once conversion table(s) are available (from above process), the appropriate table can be read in with conversionRead and applied with dbConvert as follows:

Evaluate the conversion table between Geography A (source) and B (destination) for regions that require no conversion (i.e., regions are converted 1-to-1 fromGgeography A to B). Remove these regions from the conversion process.
Sequentially share out the data relating to the region in Geography A using the conversion factor to the respective region(s) in Geography B.
Possibly aggregate Geography B components that received contributions from multiple regions in Geography A.
Depending on the application, there may be a requirement to conform to some pre-specified "control total(s)". These control totals are usually the provincial-level totals and conformance is achieved using raking (dbRake). If Step (1) removed some 1-to-1 regions, the control totals will be adjusted downward.
Any 1-to-1 regions removed during Step (1) are added back to Geography B (under the Geography B naming convention).

An example:

Consider the following conversion table which represents the demographic relationship between Geography A and B:

Geography A	Geography B	Percentage
A-1	B-1	21.96943503%
A-1	B-2	45.39105179%
A-1	B-3	32.63951317%
A-2	B-3	60.84714549%
A-2	B-4	39.15285451%
A-3	B-5	100.00000000%
A-4	B-6	49.22285940%
A-4	B-7	50.77714060%

Also, assume we have some statistic (population, say) for Geography A for a number of years:

Geography A	Year 1	Year 2	Year 3	Year 4	Year 5
A-1	14,461	14,606	14,752	14,900	15,049
A-2	16,290	16,534	16,782	17,034	17,290
A-3	489	504	519	535	551
A-4	25,285	26,119	26,981	27,871	28,791
Total	56,525	57,763	59,034	60,340	61,681

In the conversion table above, region A-3 is converted 100 technically no conversion necessary, so we remove A-3 from the conversion process. One could perform the following conversion for Year 1 in the above data:

Geography A	Geography B	Percentage	Population
A-1			14,461
	B-1	21.96943503%	3,177
	B-2	45.39105179%	6,564
	B-3	32.63951317%	4,720
A-2			16,290
	B-3	60.84714549%	9,912
	B-4	39.15285451%	6,378
A-4			25,285
	B-6	49.2228594%	12,446
	B-7	50.7771406%	12,839
	Total		56,036

Aggregating the breakout for region B-3 (that is, split across A-1 and A-2) would yield the final conversion for all years:

Geography B	Year 1	Year 2	Year 3	Year 4	Year 5
B-1	3,177	3,209	3,241	3,273	3,306
B-2	6,564	6,630	6,696	6,763	6,831
B-3	14,632	14,827	15,026	15,229	15,432
B-4	6,378	6,474	6,571	6,669	6,770
B-5	489	504	519	535	551
B-6	12,446	12,857	13,281	13,719	14,172
B-7	12,839	13,262	13,700	14,152	14,619
Total	56,525	57,763	59,034	60,340	61,681

In the above example, the actual value for the statistic has been truncated to zero decimals and made to sum to the "control totals" (assigning any remainder to the *largest value*; region B-3 in years 2 and 4) identified by Total. This is an elementary method for making totals conform to some controls. When dealing with higher-dimensional data (e.g., including age and gender detail with the above), raking is the suggested means for conformance. B-5 was added back as-is from A-3 after making sure that the aggregated sub-provincial remainders matched the control totals.

bcgov/dbutils documentation built on Sept. 30, 2022, 12:04 a.m.

bcgov/dbutils index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bcgov/dbutils
Utility functions to process and modify BC Stats' population databases

dbutils: dbutils: A package of utility functions for processing...
In bcgov/dbutils: Utility functions to process and modify BC Stats' population databases

dbutils: A package of utility functions for processing population data

Description

About

Region codes

Age vectors

Database conversion

Related to dbutils in bcgov/dbutils...

R Package Documentation

Browse R Packages

We want your feedback!

bcgov/dbutils Utility functions to process and modify BC Stats' population databases

dbutils: dbutils: A package of utility functions for processing... In bcgov/dbutils: Utility functions to process and modify BC Stats' population databases

dbutils: A package of utility functions for processing population data

Description

About

Region codes

Age vectors

Database conversion

Related to dbutils in bcgov/dbutils...

R Package Documentation

Browse R Packages

We want your feedback!

bcgov/dbutils
Utility functions to process and modify BC Stats' population databases

dbutils: dbutils: A package of utility functions for processing...
In bcgov/dbutils: Utility functions to process and modify BC Stats' population databases