| dbRake | R Documentation |
Reads a population database file (population, migration) and saves a population database file with Region values raked for each of Age and Sex. Raking can be run with user-provided region control totals, or without region control totals. Negative population values may be allowed or not (default).
This function assumes input files (e.g., InputData, CtrlPopTotals, etc.) are in an "inputs" folder. The raked output will save to an "outputs" folder (which will be created if one does not exist). If chosen, interim files are also saved to an "interim_files" folder within "outputs" (this will be created if it does not exist and saveInterimFiles is TRUE). dbRake() is a large function that takes a few minutes to run, and depends on multiple smaller functions.
dbRake( InputData, CtrlPopTotals, CtrlRegionTotals = NULL, CtrlAgeGrpsTotals = NULL, VarRegion, VarSex, VarSexTotal, AgeGrpMax = NULL, allowNegatives = FALSE, saveInterimFiles = FALSE, writeRakingLog = FALSE, writeOutputFile = FALSE, readFiles = FALSE )
InputData |
Name of database in environment that contains input data to be raked. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in. This file is assumed to have Region (e.g., LHA) by Sex (e.g., 1, 2, 3) as rows, and Ages (e.g., 0, 1, 2, ..., TOTAL (not '-999')) as columns. Values are population counts. |
CtrlPopTotals |
Name of database in environment that contains overall control totals. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in (e.g., "BC AS TOTALS.xlsx"). This file is assumed to have Sex (e.g., 1, 2, 3) as rows and Ages (e.g., 0, 1, 2, ..., TOTAL (not '-999')) as columns. Values are population counts. This file typically has dimensions of 3 (obs) by 103 variables. |
CtrlRegionTotals |
Name of database in environment that contains overall control totals. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in (e.g., "LHA TOTALS.xlsx"). Default = NULL. This file is assumed to have Region (e.g., 89 LHAs) as the first column and TOTAL (population counts) as the second column; this file is not broken out by Sex or Age. This file typically has dimensions of n (obs) by 2 variables, where "n" is the number of individual regions (e.g., 89 for LHA). If no name is provided (i.e., NULL), then region control totals are not used. Instead, the InputData will be used to generate "control" totals. |
CtrlAgeGrpsTotals |
Name of database in environment that contains initial 5 year age group totals. If 'readFiles' is TRUE, this is the name of the .xlsx or .csv in the "inputs" folder to be read in. Default = NULL. In virtually all cases, this variable will be NULL. In these cases, the InputData will be used to generate "control" totals at 5-year groupings (e.g., 0-4, 5-9, 10-14, etc). If age groups are of format -X1, -X2, ..., they will be transformed to "X-Y" format. |
VarRegion |
Name of Region variable in all files (e.g., "LHA"). |
VarSex |
Name of Sex variable in all files (e.g., "Sex"). Note: Sex must be a numeric variable (e.g., 1,2,3) where the Total is the maximum number (e.g., 3. |
VarSexTotal |
Value that corresponds to Total (e.g., 3, when 1 and 2 are Male and Female). |
AgeGrpMax |
Age of the older population that will be prorated and raked separately from other 5 year age groups. AgeGrpMax will include all ages, including itself, through the remainder of the population. Default = NULL. If AgeGrpMax is not set, the function will use 75 and up (not necessarily the oldest age; that is, the oldest age is usually 100, meaning 100 and up). The BC Stats Demographics team determined that 75 was the best age for AgeGrpMax to ensure that distortion in older populations is minimized. |
allowNegatives |
Logical value for whether or not negative population values are allowed. Default = FALSE. Only migration data should be allowed to have negative values. |
saveInterimFiles |
Logical value for whether or not interim files (.csvs) should be saved throughout the process. Default = FALSE. If saved, they will be saved in "interim_files" within "outputs" folder. This folder will be created if it does not exist and is needed. |
writeRakingLog |
Logical value for whether or not a log file (raking_log.csv) should be written. Default = FALSE. If written, it will be saved in "outputs" folder. |
writeOutputFile |
Logical value for whether or not final output file (.csv) should be written.
Default = FALSE. If TRUE, the final raked data will be saved as "RakedData.csv" to "outputs"
folder. Regardless of whether saved or not, the raked data returns to R's environment. Setting
to TRUE reduces a step ( |
readFiles |
Logical value for whether or not input files (InputData, CtrlPopTotals,
CtrlRegionTotals, CtrlAgeGrpsTotals) need to be read in. Default = FALSE. If FALSE, files are
already in environment, likely by being called or created through another function (e.g.,
|
dbRake is a large function with three main parts. Part 1 prorates and rakes Sex values by Region. Part 2 prorates and rakes 5-year Age Group values by Region and Sex. Part 3 prorates and rakes single-year Age values by Region and Sex. Throughout, checks are performed and, if chosen, results are written to raking_log.csv in the "outputs" folder (regardless of whether raking succeeds or fails). As well, interim files may be saved to an "interim_files" folder in "outputs" for future viewing. If raking succeeds, the final raked data file is saved to "outputs".
dbRake was originally an APL process (not in R). A PDF documenting that process, which holds true for most of the underlying assumptions and procedures in dbRake, is available on BC Stats' I drive (S152\S52004) in Documentation > Raking > Methodology-Raking_Final.pdf.
RakedData.csv will be saved to "outputs" folder (which will be created if one does not already exist). If set to TRUE, various interim files will be saved in an "interim_files" folder within "outputs". If set to TRUE, a log file ("raking_log.csv") will also be saved to the "outputs" folder.
Julie Hawkins, BC Stats
Raking helpers include: rounded(), read.inputs(),
real.to.int(), calc.cols(), prorate.row(),
prep.prorate.col(), prorate.col(), and raking algorithm functions A, B, C:
allowNegsnoMargin(), noNegsnoMargin(), noNegsneedMargin()
## Not run: dbRake(InputData = "POPHAE19.xlsx", CtrlPopTotals = "BC AS TOTALS.xlsx",
CtrlRegionTotals = "LHA TOTALS.xlsx", CtrlAgeGrpsTotals = NULL,
VarRegion = "LHA", VarSex = "Sex", VarSexTotal = 3, AgeGrpMax = NULL,
allowNegatives = FALSE, saveInterimFiles = FALSE, writeRakingLog = FALSE,
writeOutputFile = FALSE, readFiles = TRUE)
## End(Not run)
## Not run: ## if dbRake() is called in \code{\link{dbConvert}}(), which brings in inputs
dbRake(InputData = ToDB, CtrlPopTotals = control_totals,
CtrlRegionTotals = region_totals, CtrlAgeGrpsTotals = NULL,
VarRegion = "LHA", VarSex = "Sex", VarSexTotal = 3, AgeGrpMax = NULL,
allowNegatives = FALSE, saveInterimFiles = FALSE, writeRakingLog = TRUE,
writeOutputFile = TRUE, readFiles = FALSE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.