dbRake: Rake population database

dbRakeR Documentation

Rake population database

Description

Reads a population database file (population, migration) and saves a population database file with Region values raked for each of Age and Sex. Raking can be run with user-provided region control totals, or without region control totals. Negative population values may be allowed or not (default).

This function assumes input files (e.g., InputData, CtrlPopTotals, etc.) are in an "inputs" folder. The raked output will save to an "outputs" folder (which will be created if one does not exist). If chosen, interim files are also saved to an "interim_files" folder within "outputs" (this will be created if it does not exist and saveInterimFiles is TRUE). dbRake() is a large function that takes a few minutes to run, and depends on multiple smaller functions.

Usage

dbRake(
  InputData,
  CtrlPopTotals,
  CtrlRegionTotals = NULL,
  CtrlAgeGrpsTotals = NULL,
  VarRegion,
  VarSex,
  VarSexTotal,
  AgeGrpMax = NULL,
  allowNegatives = FALSE,
  saveInterimFiles = FALSE,
  writeRakingLog = FALSE,
  writeOutputFile = FALSE,
  readFiles = FALSE
)

Arguments

InputData

Name of database in environment that contains input data to be raked. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in. This file is assumed to have Region (e.g., LHA) by Sex (e.g., 1, 2, 3) as rows, and Ages (e.g., 0, 1, 2, ..., TOTAL (not '-999')) as columns. Values are population counts.

CtrlPopTotals

Name of database in environment that contains overall control totals. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in (e.g., "BC AS TOTALS.xlsx"). This file is assumed to have Sex (e.g., 1, 2, 3) as rows and Ages (e.g., 0, 1, 2, ..., TOTAL (not '-999')) as columns. Values are population counts. This file typically has dimensions of 3 (obs) by 103 variables.

CtrlRegionTotals

Name of database in environment that contains overall control totals. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in (e.g., "LHA TOTALS.xlsx"). Default = NULL. This file is assumed to have Region (e.g., 89 LHAs) as the first column and TOTAL (population counts) as the second column; this file is not broken out by Sex or Age. This file typically has dimensions of n (obs) by 2 variables, where "n" is the number of individual regions (e.g., 89 for LHA). If no name is provided (i.e., NULL), then region control totals are not used. Instead, the InputData will be used to generate "control" totals.

CtrlAgeGrpsTotals

Name of database in environment that contains initial 5 year age group totals. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in. Default = NULL. In virtually all cases, this variable will be NULL. In these cases, the InputData will be used to generate "control" totals at 5-year groupings (e.g., 0-4, 5-9, 10-14, etc). If age groups are of format -X1, -X2, ..., they will be transformed to "X-Y" format.

VarRegion

Name of Region variable in all files (e.g., "LHA").

VarSex

Name of Sex variable in all files (e.g., "Sex"). Note: Sex must be a numeric variable (e.g., 1,2,3) where the Total is the maximum number (e.g., 3.

VarSexTotal

Value that corresponds to Total (e.g., 3, when 1 and 2 are Male and Female).

AgeGrpMax

Age of the older population that will be prorated and raked separately from other 5 year age groups. AgeGrpMax will include all ages, including itself, through the remainder of the population. Default = NULL. If AgeGrpMax is not set, the function will use 75 and up (not necessarily the oldest age; that is, the oldest age is usually 100, meaning 100 and up). The BC Stats Demographics team determined that 75 was the best age for AgeGrpMax to ensure that distortion in older populations is minimized.

allowNegatives

Logical value for whether or not negative population values are allowed. Default = FALSE. Only migration data should be allowed to have negative values.

saveInterimFiles

Logical value for whether or not interim files (.csvs) should be saved throughout the process. Default = FALSE. If saved, they will be saved in "interim_files" within "outputs" folder. This folder will be created if it does not exist and is needed.

writeRakingLog

Logical value for whether or not a log file (raking_log.csv) should be written. Default = FALSE. If written, it will be saved in "outputs" folder.

writeOutputFile

Logical value for whether or not final output file (.csv) should be written. Default = FALSE. If TRUE, the final raked data will be saved as "RakedData.csv" to "outputs" folder. Regardless of whether saved or not, the raked data returns to R's environment. Setting to TRUE reduces a step (dbWrite). Setting to TRUE is not useful when raking multiple years of data, as the output file will be overwritten for each successive year. In that case, call the raking function from multiRake.

readFiles

Logical value for whether or not input files (InputData, CtrlPopTotals, CtrlRegionTotals, CtrlAgeGrpsTotals) need to be read in. Default = FALSE. If FALSE, files are already in environment, likely by being called or created through another function (e.g., dbConvert, dbRead).

Details

dbRake is a large function with three main parts. Part 1 prorates and rakes Sex values by Region. Part 2 prorates and rakes 5-year Age Group values by Region and Sex. Part 3 prorates and rakes single-year Age values by Region and Sex. Throughout, checks are performed and, if chosen, results are written to raking_log.csv in the "outputs" folder (regardless of whether raking succeeds or fails). As well, interim files may be saved to an "interim_files" folder in "outputs" for future viewing. If raking succeeds, the final raked data file is saved to "outputs".

dbRake was originally an APL process (not in R). A PDF documenting that process, which holds true for most of the underlying assumptions and procedures in dbRake, is available on BC Stats' I drive (S152\S52004) in Documentation > Raking > Methodology-Raking_Final.pdf.

Value

RakedData.csv will be saved to "outputs" folder (which will be created if one does not already exist). If set to TRUE, various interim files will be saved in an "interim_files" folder within "outputs". If set to TRUE, a log file ("raking_log.csv") will also be saved to the "outputs" folder.

Author(s)

Julie Hawkins, BC Stats

See Also

Raking helpers include: rounded(), read.inputs(), real.to.int(), calc.cols(), prorate.row(), prep.prorate.col(), prorate.col(), and raking algorithm functions A, B, C: allowNegsnoMargin(), noNegsnoMargin(), noNegsneedMargin()

Examples

## Not run:   dbRake(InputData = "POPHAE19.xlsx", CtrlPopTotals = "BC AS TOTALS.xlsx",
                  CtrlRegionTotals = "LHA TOTALS.xlsx", CtrlAgeGrpsTotals = NULL,
                  VarRegion = "LHA", VarSex = "Sex", VarSexTotal = 3, AgeGrpMax = NULL,
                  allowNegatives = FALSE, saveInterimFiles = FALSE, writeRakingLog = FALSE,
                  writeOutputFile = FALSE, readFiles = TRUE)  
## End(Not run)
## Not run:   ## if dbRake() is called in \code{\link{dbConvert}}(), which brings in inputs
           dbRake(InputData = ToDB, CtrlPopTotals = control_totals,
                  CtrlRegionTotals = region_totals, CtrlAgeGrpsTotals = NULL,
                  VarRegion = "LHA", VarSex = "Sex", VarSexTotal = 3, AgeGrpMax = NULL,
                  allowNegatives = FALSE, saveInterimFiles = FALSE, writeRakingLog = TRUE,
                  writeOutputFile = TRUE, readFiles = FALSE)  
## End(Not run)

bcgov/dbutils documentation built on Sept. 30, 2022, 12:04 a.m.