In nstrayer/phewasHelper: Helpers For Working With PheWas And PheCode Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  fig.retina = TRUE
)

library(dplyr)
library(phewasHelper)

# Simulate some random data associated with phecodes
phewas_data <- sample_n(phecode_descriptions, 300) %>%
  left_join(mutate(category_colors(return_df = TRUE),
                   bias = rnorm(n(), sd = 3)),
            by = 'category') %>% 
  mutate(val = rnorm(n(), mean = bias),
         code = as.numeric(phecode)) %>% 
  select(code, val)

phewasHelper

The goal of phewasHelper is to provide a set of simple lightweight helper functions for working with Phecode and Phewas data.

Installation

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("nstrayer/phewas_helper")

Usage

We will use a sample set of simulated PheWas data with phecodes to demonstrate the functions.

library(phewasHelper)

head(phewas_data)

Normalizing phecodes

Phecodes show up in about a million different formats. In our demo data the phecodes have been converted to a numeric value. This would be an issue if we tried to harmonize with data that stored the phecodes in a string. The function normalize_phecodes is designed to fix this problem. It takes any phecode array and coerces it to a standard zero-padded string.

phewas_data %>% 
  mutate(fixed_code = normalize_phecodes(code)) %>% 
  head()

# Update our original data with normalized phecodes
phewas_data <- phewas_data %>% 
  mutate(code = normalize_phecodes(code))

Getting phecode information

Another issue that is commonly encountered in PheWas results is wanting to know what exactly a code is. The functions get_phecode_info() and join_phecode_info() help with that.

get_phecode_info() is the simpler of the two. It takes as input an array of phecodes and returns an array of the desired information, either description or category. This is useful for adding individual columns to a dataframe.

phewas_data %>% 
  mutate(descript = get_phecode_info(code, 'description'),
         category = get_phecode_info(code, 'category')) %>% 
  head()

For more a more complete labeling of phecode information the function join_phecode_info() modifies a passsed dataframe by appending a desired subset of description, category, category number, and phecode index columns.

# We can append all info available
phewas_data %>% 
  join_phecode_info(phecode_column = code) %>% 
  head()


# Or we can just extract what we need
phewas_data <- phewas_data %>% 
  join_phecode_info(phecode_column = code,
                    cols_to_join = c("description", "category", "phecode_index"))

head(phewas_data)

Coloring PheWas plots

Manhattan plots are commonly made of phewas results. Frequently the colors of the plots points are encoded by the categories. The default color palletes in ggplot2 and base-plot are not great and custom palletes like R color-brewer don't give you enough colors to work with all the categories. To deal with this category_colors() returns a mapping of phecode category to colors that can be used easily in your plots.

library(ggplot2)

phewas_data %>% 
  ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) +
  geom_point() +
  scale_color_manual(values = category_colors()) +
  theme(axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

If just the color pallete is needed for ggplot then the function scale_color_phecode() makes this even easier.

phewas_data %>% 
  ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) +
  geom_point() +
  scale_color_phecode() +
  theme(axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

ggplot theme

One last helper is a theme object that lets you get rid of the tick marks for each phecode without the laboreous typing of theme(axis.ticks.x = ...) repeatedly.

phewas_data %>% 
  ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) +
  geom_point() +
  scale_color_manual(values = category_colors()) +
  theme_phewas()

It can also be used on phecode-on-y-axis plots.

phewas_data %>% 
  ggplot(aes(y = reorder(code, phecode_index), x = val, color = category)) +
  geom_point() +
  scale_color_manual(values = category_colors()) +
  theme_phewas(phecode_on_x_axis = FALSE)

Rolling up phecodes

Sometimes phecodes are reported with only the leaf values filled in. These data can be converted to a full - or "rolled up" - format with the functions rollup_phecode_counts() and rollup_phecode_pairs. First let's look at the counts version.

# Patient data with non-rolled up codes
patient_data <- dplyr::tribble(
  ~patient,    ~code, ~counts,
         1, "250.23",      7,
         1, "250.25",      4,
         1, "696.40",      1,
         1, "555.21",      4,
         2, "401.22",      6,
         2, "204.00",      5,
         2, "751.11",      2,
         2, "008.00",      1,
         2, "008.50",      2,
         2, "008.51",      3,
)

# Rollup the leaf codes to their parents
patient_data %>% 
  rollup_phecode_counts(phecode_col = code) %>% 
  head() %>% 
  knitr::kable()

We can also do the same with data that just represents binary yes or no for phecode occurances with rollup_phecode_pairs().

# Patient data with non-rolled up codes
patient_data <- dplyr::tribble(
  ~patient,     ~code, 
         1,  "250.23",    
         1,  "250.25",    
         1,  "696.40",    
         1,  "555.21",    
         2,  "401.22",    
         2,  "204.00",    
         2,  "751.11",    
         2,  "008.00",    
         2,  "008.50",    
         2,  "008.51",    
)

# Rollup the leaf codes to their parents
patient_data %>% 
  rollup_phecode_pairs(phecode_col = code) %>% 
  arrange(code) %>% 
  head() %>% 
  knitr::kable()