knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", fig.retina = TRUE ) library(dplyr) library(phewasHelper) # Simulate some random data associated with phecodes phewas_data <- sample_n(phecode_descriptions, 300) %>% left_join(mutate(category_colors(return_df = TRUE), bias = rnorm(n(), sd = 3)), by = 'category') %>% mutate(val = rnorm(n(), mean = bias), code = as.numeric(phecode)) %>% select(code, val)
The goal of phewasHelper is to provide a set of simple lightweight helper functions for working with Phecode and Phewas data.
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("nstrayer/phewas_helper")
We will use a sample set of simulated PheWas data with phecodes to demonstrate the functions.
library(phewasHelper) head(phewas_data)
Phecodes show up in about a million different formats. In our demo data the phecodes have been converted to a numeric value. This would be an issue if we tried to harmonize with data that stored the phecodes in a string. The function normalize_phecodes
is designed to fix this problem. It takes any phecode array and coerces it to a standard zero-padded string.
phewas_data %>% mutate(fixed_code = normalize_phecodes(code)) %>% head() # Update our original data with normalized phecodes phewas_data <- phewas_data %>% mutate(code = normalize_phecodes(code))
Another issue that is commonly encountered in PheWas results is wanting to know what exactly a code is. The functions get_phecode_info()
and join_phecode_info()
help with that.
get_phecode_info()
is the simpler of the two. It takes as input an array of phecodes and returns an array of the desired information, either description or category. This is useful for adding individual columns to a dataframe.
phewas_data %>% mutate(descript = get_phecode_info(code, 'description'), category = get_phecode_info(code, 'category')) %>% head()
For more a more complete labeling of phecode information the function join_phecode_info()
modifies a passsed dataframe by appending a desired subset of description, category, category number, and phecode index columns.
# We can append all info available phewas_data %>% join_phecode_info(phecode_column = code) %>% head() # Or we can just extract what we need phewas_data <- phewas_data %>% join_phecode_info(phecode_column = code, cols_to_join = c("description", "category", "phecode_index")) head(phewas_data)
Manhattan plots are commonly made of phewas results. Frequently the colors of the plots points are encoded by the categories. The default color palletes in ggplot2 and base-plot are not great and custom palletes like R color-brewer don't give you enough colors to work with all the categories. To deal with this category_colors()
returns a mapping of phecode category to colors that can be used easily in your plots.
library(ggplot2) phewas_data %>% ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) + geom_point() + scale_color_manual(values = category_colors()) + theme(axis.ticks.x = element_blank(), axis.text.x = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())
If just the color pallete is needed for ggplot
then the function scale_color_phecode()
makes this even easier.
phewas_data %>% ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) + geom_point() + scale_color_phecode() + theme(axis.ticks.x = element_blank(), axis.text.x = element_blank(), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank())
One last helper is a theme object that lets you get rid of the tick marks for each phecode without the laboreous typing of theme(axis.ticks.x = ...)
repeatedly.
phewas_data %>% ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) + geom_point() + scale_color_manual(values = category_colors()) + theme_phewas()
It can also be used on phecode-on-y-axis plots.
phewas_data %>% ggplot(aes(y = reorder(code, phecode_index), x = val, color = category)) + geom_point() + scale_color_manual(values = category_colors()) + theme_phewas(phecode_on_x_axis = FALSE)
Sometimes phecodes are reported with only the leaf values filled in. These data can be converted to a full - or "rolled up" - format with the functions rollup_phecode_counts()
and rollup_phecode_pairs
. First let's look at the counts version.
# Patient data with non-rolled up codes patient_data <- dplyr::tribble( ~patient, ~code, ~counts, 1, "250.23", 7, 1, "250.25", 4, 1, "696.40", 1, 1, "555.21", 4, 2, "401.22", 6, 2, "204.00", 5, 2, "751.11", 2, 2, "008.00", 1, 2, "008.50", 2, 2, "008.51", 3, ) # Rollup the leaf codes to their parents patient_data %>% rollup_phecode_counts(phecode_col = code) %>% head() %>% knitr::kable()
We can also do the same with data that just represents binary yes or no for phecode occurances with rollup_phecode_pairs()
.
# Patient data with non-rolled up codes patient_data <- dplyr::tribble( ~patient, ~code, 1, "250.23", 1, "250.25", 1, "696.40", 1, "555.21", 2, "401.22", 2, "204.00", 2, "751.11", 2, "008.00", 2, "008.50", 2, "008.51", ) # Rollup the leaf codes to their parents patient_data %>% rollup_phecode_pairs(phecode_col = code) %>% arrange(code) %>% head() %>% knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.