knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>") source("../R/test_data_generator.R")
We have examples to demonstrate how to recode variables with the recodeflow function rec_with_table()
Our examples use following packages:
Package recodeflow
Steps on how to install recodeflow
are in how to install
#Load the package library(recodeflow)
Package dplyr to combine datasets (function: bind_rows
).
library(dplyr)
Our examples use example data
Our examples use the dataset pbc
from the package survival. We've split this dataset in two (tester1 and tester2) to mimic real data e.g., the same survey preformed in separate years. For our examples, we've also added columns (agegrp5
and agegrp10
) to this dataset.
In our example datasets, the variable sex
contains the values: m for males and f for females.
Using dataset tester1, we'll recode the variable sex
into a harmonized sex
variable. The harmonized sex
variable has the values: 0 for males and 1 for females.
1) Recode the sex
variable in tester1.
sex_1 <- rec_with_table(data = tester1, variables = "sex", variable_details = variable_details, log = TRUE, var_labels = c(sex = "sex") )
head(sex_1)
We'll recode and combine the variable sex
for our two datasets.
1) Recode the sex
variable in tester1 and tester2.
sex_1 <- rec_with_table(data = tester1, variables = "sex", variable_details = variable_details, log = TRUE, var_labels = c(sex = "Sex") ) head(sex_1) sex_2 <- rec_with_table(data = tester2, variables = "sex", variable_details = variable_details, log = TRUE, var_labels = c(sex = "Sex") ) tail(sex_2)
2) Combine the harmonized sex variable from tester1 to the harmonized sex variable in tester2.
sex_combined <- bind_rows(sex_1, sex_2)
head(sex_combined) tail(sex_combined)
3) Set labels
Labels are lost during the database merging.
Use set_data_labels()
to label the variables in your final dataset. set_data_labels()
sets the labels with the original information in variables
and variable_details
.
labeled_sex_combined <- set_data_labels( data_to_label = sex_combined, variable_details = variable_details, variables_sheet = variables )
You could have a situation where a variable is the same across datasets but its categories change.
In our example data the variable agegrp
is different in tester1 and tester2.
agegrp
variable is 5-year age groups: 20-24, 25-29, 30-34, etc.agegrp
variable is 10-year age groups: 20-29, 30-39, 40-49, etc.There are three options to facilitate the use of variables with inconsistent categories across datasets.
agegrp
variable into a common variable for only datasets with the same category responsesRecode the agegrp
variable into a common variable only in datasets were the categories are the same. If the categories are different between datasets, separate columns will be created.
The categories in the agegrp
variable in tester1 are different than the categories of agegrp
in tester2. Therefore, it is not possible to have the same agegrp
categories across our example data sets.
1) Recode agegrp5
in tester1 and recode agegrp10
in tester2.
agegrp_1 <- rec_with_table(data = tester1, variables = "agegrp5", variable_details = variable_details, log = TRUE, ) head(agegrp_1) agegrp_2 <- rec_with_table(data = tester2, variables = "agegrp10", variable_details = variable_details, log = TRUE) head(agegrp_2)
2) Combine the harmonized variable agegrp5
in tester1 with the harmonized agegrp10
in tester2.
agegrp_combined <- bind_rows(agegrp_1, agegrp_2)
head(agegrp_combined) tail(agegrp_combined)
agegrp
variable into a continuous age_cont
variableRecode categorical variable agegrp
into a single harmonized continuous
variable age_cont
.
age_cont
takes the midpoint age of each category for 'agegrp' across datasets. With this option, the categorical variable 'agegrp' from each dataset can be combined into a single dataset.
1) Recode variable agegrp
in tester1 and agegrp
in tester2 to the harmonized continuous variable age_cont
.
agegrp_1_cont <- rec_with_table(data = tester1, variables = "age_cont", variable_details = variable_details, log = TRUE) head(agegrp_1_cont) agegrp_2_cont <- rec_with_table(data = tester2, variables = "age_cont", variable_details = variable_details, log = TRUE) head(agegrp_2_cont)
2) Combine the harmonized continous variable age_cont
from tester1 and tester2.
agegrp_cont_combined <- bind_rows(agegrp_1_cont, agegrp_2_cont)
head(agegrp_cont_combined) tail(agegrp_cont_combined)
agegrp
variable into a harmonized categorical variableDataset tester1 has 5-year age groups (e.g., 30-34, 35-39), and tester2 has 10-year age groups (e.g., 30-39). Therefore, we can collapse the 5-year age groups in dataset tester1 to the same 10-year age groups in dataset tester2.
1) Recode variable agegrp
in tester1 into agegrp10
. recode variable agegrp
in tester2 into agegrp10
.
agegrp10_1 <- rec_with_table(data = tester1, variables = "agegrp10", variable_details = variable_details, log = TRUE) head(agegrp10_1) agegrp10_2 <- rec_with_table(data = tester2, variables = "agegrp10", variable_details = variable_details, log = TRUE) head(agegrp10_2)
2) Combine the harmonized categorical variable age_cat
from tester1 and tester2.
agegrp10_combined <- bind_rows(agegrp10_1, agegrp10_2)
head(agegrp10_combined) tail(agegrp10_combined)
The variables argument in rec_with_table()
allows multiple variables to be
recoded from a dataset.
In this example, the age
and sex
variables from the tester1 and tester2 datasets will be recoded and labeled using rec_with_table()
.
We'll then combine the two recoded datasets into a single dataset and labeled
using set_data_labels()
.
1) Recode age
and sex
in dataset tester1 and tester2
age_sex_1 <- rec_with_table(data = tester1, variables = c("age", "sex"), variable_details = variable_details, log = TRUE, var_labels = c(age = "Age", sex = "Sex") ) head(age_sex_1) age_sex_2 <- rec_with_table(data = tester2, variables = c("age", "sex"), variable_details = variable_details, log = TRUE, var_labels = c(age = "Age", sex = "Sex") ) head(age_sex_2)
2) Combine the harmonized variables age
and sex
from tester1 and tester2.
combined_age_sex <- bind_rows(age_sex_1, age_sex_2) head(combined_age_sex)
3) Set labels
Use set_data_labels()
to label the variables in your final dataset. set_data_labels()
sets the labels with the original information in variables
and variable_details
.
var_labels
can be used all the variables in variables.csv
or a subset of variables.
labeled_combined_age_sex <- set_data_labels( data_to_label = combined_age_sex, variable_details = variable_details, variables_sheet = variables )
You can check if labels have been added to your recoded dataset by using get_label()
.
library(sjlabelled) get_label(labeled_combined_age_sex)
For more information on get_label()
and other label helper functions, please
refer to the sjlabelled
package.
All the variables listed in variables
worksheet can be recoded with rec_with_table()
.
In this example, all variables specified in the variables
worksheet will be recoded and combined for the datasets tester1 and tester2.
options(htmlwidgets.TOJSON_ARGS = list(na = "string"))
1) Recode all variables listed in the variables worksheet, for dataset tester1 and dataset tester2
recoded1 <- rec_with_table(data = tester1, variables = variables, variable_details = variable_details, log = TRUE, ) recoded2 <- rec_with_table(data = tester2, variables = variables, variable_details = variable_details, log = TRUE, )
2) Combine recoded datasets
combined_dataset <- bind_rows(recoded1, recoded2)
3) Set labels for the combined recoded dataset
labeled_combined <- set_data_labels(data_to_label = combined_dataset, variable_details = variable_details, variables_sheet = variables )
To know the origin of each row of data, you can use the rec_with_table
argument
attach_data_name
. When the argument attach_data_name
is set to true it will add a column with the name of the dataset the row is from.
1) Recode variables age
and sex
and attach dataset name for tester1 and tester2.
age_sex_1 <- rec_with_table(data = tester1, variables = c("age", "sex"), variable_details = variable_details, var_labels = c(age = "Age", sex = "Sex"), log = TRUE, attach_data_name = TRUE ) age_sex_2 <- rec_with_table(data = tester2, variables = c("age", "sex"), variable_details = variable_details, var_labels = c(age = "Age", sex = "Sex"), log = TRUE, attach_data_name = TRUE )
2) Combine the harmonized datasets
combined_age_sex <- bind_rows(age_sex_1, age_sex_2) head(combined_age_sex) tail(combined_age_sex)
Derived variables are variables that are not in the original dataset; rather they are created using variables from the original dataset.
Descriptions of derived functions are in the article derived functions
To recode a derived variable, you must:
variables
and variable_details
,Our example derived variable example_der
equals chol
times bili
.
1) Recode the underlying variables: chol
and bili
and the derived variable example_der
for tester1 and tester2.
derived1 <- rec_with_table(data = tester1, variables = c("chol", "bili","example_der"), variable_details = variable_details, log = TRUE) derived2 <- rec_with_table(data = tester2, variables = c("chol", "bili","example_der"), variable_details = variable_details, log = TRUE)
2) Combine the harmonized variables: chol
, bili
, and exampler_der
combined_der <- bind_rows(derived1, derived2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.