Harmonization sheet instructions"
In psHarmonize: Creates a Harmonized Dataset Based on a Set of Instructions

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(dplyr)
library(knitr)
library(stringr)
library(tidyr)
library(glue)
library(purrr)

library(psHarmonize)

The harmonization sheet is the input for the harmonization function. It also serves as the set of instructions for recoding (if any) of your variables.

There are two different types of modifications you can choose from: - Recode category - Function

Recode category

Recode category will let you change values of a categorizal variable. For example, if you have a variable with values of 1s and 0s, and you want your harmonized variable to have "Yes"s and "No"s, you can use the recode category option.

Cohort A has an education variable with the values of 1-5. They correspond to values such as "No education", "Completed grade school", etc.

If we want to harmonize these values from 1-5, to harmonized values, we can use the recode category option. In this example we will harmonize the coded values to "No education/grade school", "High school", and "College".

The table below shows how these values relate to each other.

Coded values | Coded value meaning | Harmonized values -------------|------------------------|--------------------------- 1 | No education | No education/grade school 2 | Completed grade school | No education/grade school 3 | Jr-High School | High school 4 | Completed High School | High school 5 | Some college | College

In the harmonization sheet you will want to enter recode category in the code_type column. You will then type the code pairings in the code1 column with an = sign (1 = Yes for example).

harmonization_sheet_example %>%
  filter(study == 'Cohort A' & item == 'education') %>%
  select(code_type, code1) %>%
  kable()

The harmonization function will then code every value that is a 1 as "No education/grade school", 2 as "No education/grade school", etc.

Function

If you want to convert a continuous variable (lbs to kg for example), you can use the function option. To do this you would enter function into code_type. In code1 you would enter a function, with x as your input variable.

For example, Cohort A has weight variables stored as pounds. If we want to convert these values to kilograms, we can enter the following into code: x / 2.205.

harmonization_sheet_example %>%
  filter(study == 'Cohort A' & item == 'weight') %>%
  select(code_type, code1) %>%
  kable()

Functions (multiple variables input)

If you have a function that requires multiple variables as input, you can add multiple source_item variables seperated by a semicolon:

var_1; var_2

You can then refer to the variables in code1 by x1, x2, etc. If you wanted to add the two variables together you could write the following in code1.

x1 + x2

Multi step variables

If you need to refer to a variable you previously made you can use "previous_dataset" in source_dataset. Make sure to refer to the variables as their new names (in item), and use "ID" as the id_var.