knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

Set up

Load libraries:

library(knitr)
library(kableExtra)
library(dplyr)
library(tibble)
library(stringr)
library(KrasAlleleCna)
# devtools::load_all()

Establish paths:

pkg_dir <- system.file(package = "KrasAlleleCna")
extdata_dir <- system.file("extdata", package = "KrasAlleleCna")
data_dir <- system.file("data", package = "KrasAlleleCna")

Tidy data

'ANNOVAR' returned a separate file for each tissue sample (stored in "data-raw/gdc/annovar_output"). In "data-raw/handle_annovar.R", each file was read in as a tibble, and a list of tibbles was saved to "inst/extdata/".

# list of all annovar files
annovar_data <- readRDS(file.path(extdata_dir, "annovar_output_list.rds"))

Each file was read in as a tibble and the column aa_mod was added as a simpler version of the amino acid mutation. Only the rows with KRAS non-synonymous mutation data was kept.

annovar_tib <- lapply(annovar_data, parse_annovar_mutation) %>%
    bind_rows(.id = "file_id") %>%
    mutate(file_id = str_remove_all(file_id, "\\.hg38_multianno\\.txt")) %>%
    filter(str_detect(Func.refGene, "exonic") &
           Gene.refGene == "KRAS" &
           !(ExonicFunc.refGene %in% c("synonymous SNV")))
head(annovar_tib)

A quick test to see if any mutations were missing.

any(is.na(annovar_tib$aa_mod))

Save tidy data

usethis::use_data(annovar_tib, overwrite = TRUE)

Show results

head(annovar_tib)
kable(annovar_tib) %>%
    kable_styling(bootstrap_options = c("striped", "hover")) %>%
    scroll_box(width = "500px", height = "200px")


jhrcook/KrasAlleleCna documentation built on May 28, 2019, 1:22 p.m.