knitr::opts_chunk$set(echo = TRUE)

Introduction

There are two types derived variables in the CCHS surveys. Both types of derived variables are supported in cchsflow.

cchsflow calculates these more complex derived variables using functions that are referenced in variable_details.csv within RecTo section with the prefix 'Func::'. The variables used in the function are referenced in the variableStart section with the prefix 'DerivedVar::'. For example, BMI (HWTGBMI_der) includes Func::bmi_fun in the RecTo section; and DerivedVar::[HWTGHTM, HWTGWTK] in the variableStart section, which indicates the two starting variables (HWTGHTM, HWTGWTK).

Example - Body Mass Index (BMI)

While BMI is calculated across all CCHS cycles, the method in which it is calculated varies across CCHS cycles, leading to misclassification error that might affect your study. As such, a derived variable for BMI has been created in cchsflow that uses harmonized height (HWTGHTM) and weight (HWTGWTK) variables across all CCHS cycles.

Using rec_with_table() you can transform the derived BMI variable across multiple CCHS cycles and create a transformed dataset.

In order derive variables, you must load the existing custom function associated with the derived variable

# Custom ifelse for evaluating NA
if_else2 <- function(x, a, b) {
  falseifNA <- function(x) {
    ifelse(is.na(x), FALSE, x)
  }
  ifelse(falseifNA(x), a, b)
}

#BMI derived variable
# HWTGHTM: height (in meters)
# HWTGWTK: weight (in kilograms)
bmi_fun <- 
  function(HWTGHTM, 
           HWTGWTK) {
    ifelse2((!is.na(HWTGHTM)) & (!is.na(HWTGWTK)), 
            (HWTGWTK/(HWTGHTM*HWTGHTM)), NA)
  }
library(cchsflow)
bmi2003 <- rec_with_table(cchs2003_p, variables = c("HWTGHTM", "HWTGWTK", 
            "HWTGBMI_der"), log = TRUE)

bmi2010 <- rec_with_table(cchs2010_p, variables = c("HWTGHTM", "HWTGWTK",
            "HWTGBMI_der"), log = TRUE)

Since derived variables are based on previously transformed variables, if you want to only transform your derived variable, you must also specify its base CCHS variables in rec_with_table() as shown above. So for the derived BMI variable, you will have to also specify the height (HWTGHTM) and weight (HWTGWTK) variables.

Using bind_rows(), you can then combine your transformed datasets.

library(dplyr)
library(knitr)
library(kableExtra)
combined_bmi <- merge_rec_data(bmi2003, bmi2010)

kable(combined_bmi[1:10, ])

Creating a derived variable

Creating a derived variable requires the harmonization of existing CCHS variables, and a custom function that uses those harmonized variables. For more information on how to create a derived variable see here



Big-Life-Lab/cchsflow documentation built on Feb. 23, 2024, 12:04 a.m.