README.md
In nstrayer/phewasHelper: Helpers For Working With PheWas And PheCode Data

phewasHelper

The goal of phewasHelper is to provide a set of simple lightweight helper functions for working with Phecode and Phewas data.

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("nstrayer/phewas_helper")

We will use a sample set of simulated PheWas data with phecodes to demonstrate the functions.

library(phewasHelper)

head(phewas_data)
#> # A tibble: 6 x 2
#>    code    val
#>   <dbl>  <dbl>
#> 1  420.  0.136
#> 2  401   1.04 
#> 3  741. -2.26 
#> 4  580. -6.53 
#> 5  204.  2.59 
#> 6  394.  2.67

Phecodes show up in about a million different formats. In our demo data the phecodes have been converted to a numeric value. This would be an issue if we tried to harmonize with data that stored the phecodes in a string. The function normalize_phecodes is designed to fix this problem. It takes any phecode array and coerces it to a standard zero-padded string.

phewas_data %>% 
  mutate(fixed_code = normalize_phecodes(code)) %>% 
  head()
#> # A tibble: 6 x 3
#>    code    val fixed_code
#>   <dbl>  <dbl> <chr>     
#> 1  420.  0.136 420.20    
#> 2  401   1.04  401.00    
#> 3  741. -2.26  741.20    
#> 4  580. -6.53  580.12    
#> 5  204.  2.59  204.10    
#> 6  394.  2.67  394.20

# Update our original data with normalized phecodes
phewas_data <- phewas_data %>% 
  mutate(code = normalize_phecodes(code))

Another issue that is commonly encountered in PheWas results is wanting to know what exactly a code is. The functions get_phecode_info() and join_phecode_info() help with that.

get_phecode_info() is the simpler of the two. It takes as input an array of phecodes and returns an array of the desired information, either description or category. This is useful for adding individual columns to a dataframe.

phewas_data %>% 
  mutate(descript = get_phecode_info(code, 'description'),
         category = get_phecode_info(code, 'category')) %>% 
  head()
#> # A tibble: 6 x 4
#>   code      val descript                             category          
#>   <chr>   <dbl> <chr>                                <chr>             
#> 1 420.20  0.136 Pericarditis                         circulatory system
#> 2 401.00  1.04  Hypertension                         circulatory system
#> 3 741.20 -2.26  Stiffness of joint                   musculoskeletal   
#> 4 580.12 -6.53  Non-proliferative glomerulonephritis genitourinary     
#> 5 204.10  2.59  Lymphoid leukemia                    neoplasms         
#> 6 394.20  2.67  Mitral valve disease                 circulatory system

For more a more complete labeling of phecode information the function join_phecode_info() modifies a passsed dataframe by appending a desired subset of description, category, category number, and phecode index columns.

# We can append all info available
phewas_data %>% 
  join_phecode_info(phecode_column = code) %>% 
  head()
#> # A tibble: 6 x 6
#>   code      val description           category     category_number phecode_index
#>   <chr>   <dbl> <chr>                 <chr>                  <int>         <int>
#> 1 420.20  0.136 Pericarditis          circulatory…               8           771
#> 2 401.00  1.04  Hypertension          circulatory…               8           744
#> 3 741.20 -2.26  Stiffness of joint    musculoskel…              14          1594
#> 4 580.12 -6.53  Non-proliferative gl… genitourina…              11          1157
#> 5 204.10  2.59  Lymphoid leukemia     neoplasms                  2           167
#> 6 394.20  2.67  Mitral valve disease  circulatory…               8           733


# Or we can just extract what we need
phewas_data <- phewas_data %>% 
  join_phecode_info(phecode_column = code,
                    cols_to_join = c("description", "category", "phecode_index"))

head(phewas_data)
#> # A tibble: 6 x 5
#>   code      val description                       category         phecode_index
#>   <chr>   <dbl> <chr>                             <chr>                    <int>
#> 1 420.20  0.136 Pericarditis                      circulatory sys…           771
#> 2 401.00  1.04  Hypertension                      circulatory sys…           744
#> 3 741.20 -2.26  Stiffness of joint                musculoskeletal           1594
#> 4 580.12 -6.53  Non-proliferative glomerulonephr… genitourinary             1157
#> 5 204.10  2.59  Lymphoid leukemia                 neoplasms                  167
#> 6 394.20  2.67  Mitral valve disease              circulatory sys…           733

Manhattan plots are commonly made of phewas results. Frequently the colors of the plots points are encoded by the categories. The default color palletes in ggplot2 and base-plot are not great and custom palletes like R color-brewer don’t give you enough colors to work with all the categories. To deal with this category_colors() returns a mapping of phecode category to colors that can be used easily in your plots.

library(ggplot2)

phewas_data %>% 
  ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) +
  geom_point() +
  scale_color_manual(values = category_colors()) +
  theme(axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

If just the color pallete is needed for ggplot then the function scale_color_phecode() makes this even easier.

phewas_data %>% 
  ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) +
  geom_point() +
  scale_color_phecode() +
  theme(axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

One last helper is a theme object that lets you get rid of the tick marks for each phecode without the laboreous typing of theme(axis.ticks.x = ...) repeatedly.

phewas_data %>% 
  ggplot(aes(x = reorder(code, phecode_index), y = val, color = category)) +
  geom_point() +
  scale_color_manual(values = category_colors()) +
  theme_phewas()

It can also be used on phecode-on-y-axis plots.

phewas_data %>% 
  ggplot(aes(y = reorder(code, phecode_index), x = val, color = category)) +
  geom_point() +
  scale_color_manual(values = category_colors()) +
  theme_phewas(phecode_on_x_axis = FALSE)

Sometimes phecodes are reported with only the leaf values filled in. These data can be converted to a full - or “rolled up” - format with the functions rollup_phecode_counts() and rollup_phecode_pairs. First let’s look at the counts version.

# Patient data with non-rolled up codes
patient_data <- dplyr::tribble(
  ~patient,    ~code, ~counts,
         1, "250.23",      7,
         1, "250.25",      4,
         1, "696.40",      1,
         1, "555.21",      4,
         2, "401.22",      6,
         2, "204.00",      5,
         2, "751.11",      2,
         2, "008.00",      1,
         2, "008.50",      2,
         2, "008.51",      3,
)

# Rollup the leaf codes to their parents
patient_data %>% 
  rollup_phecode_counts(phecode_col = code) %>% 
  head() %>% 
  knitr::kable()

| code | patient | counts | | :----- | ------: | -----: | | 250.00 | 1 | 11 | | 250.20 | 1 | 11 | | 250.23 | 1 | 7 | | 250.25 | 1 | 4 | | 555.00 | 1 | 4 | | 555.20 | 1 | 4 |

We can also do the same with data that just represents binary yes or no for phecode occurances with rollup_phecode_pairs().

# Patient data with non-rolled up codes
patient_data <- dplyr::tribble(
  ~patient,     ~code, 
         1,  "250.23",    
         1,  "250.25",    
         1,  "696.40",    
         1,  "555.21",    
         2,  "401.22",    
         2,  "204.00",    
         2,  "751.11",    
         2,  "008.00",    
         2,  "008.50",    
         2,  "008.51",    
)

# Rollup the leaf codes to their parents
patient_data %>% 
  rollup_phecode_pairs(phecode_col = code) %>% 
  arrange(code) %>% 
  head() %>% 
  knitr::kable()

| patient | code | | ------: | :----- | | 2 | 008.00 | | 2 | 008.50 | | 2 | 008.51 | | 2 | 204.00 | | 1 | 250.00 | | 1 | 250.20 |

nstrayer/phewasHelper documentation built on April 2, 2020, 5:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com