knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
Load libraries
library(dplyr) library(tibble) library(stringr) library(KrasAlleleCna)
Establish path variables:
pkg_dir <- system.file(package = "KrasAlleleCna") extdata_dir <- system.file("extdata", package = "KrasAlleleCna") data_dir <- system.file("data", package = "KrasAlleleCna")
'mpileup' returned a separate file for each tissue sample (stored in "data-raw/gdc/mpileup_output"). In "data-raw/handle_mpileup.R", each file was read in as a tibble, and a list of tibbles was saved to "inst/extdata/".
This list of tibbles was read in and was combined into a single tibble, adding a new column with the file name. The filename
column was parsed to create file_id
.
# a list of all mpileup files mpileup_data <- readRDS(file.path(extdata_dir, "mpileup_output_list.rds")) %>% bind_rows(.id = "filename") %>% mutate(file_id = str_remove_all(filename, "_mpileup.vcf$"))
I only kept the VCF files for which the ANNOVAR data was successfully collected and contained a KRAS mutation.
# list of file IDs (after removing the "mpileup.vcf" suffix) mpileup_ids <- unique(unlist(mpileup_data$file_id)) # number of shared IDs sum(mpileup_ids %in% unlist(annovar_tib$file_id)) # number of VCF files removed in this step sum(!(mpileup_ids %in% unlist(annovar_tib$file_id)))
mpileup_data <- mpileup_data %>% filter(file_id %in% annovar_tib$file_id)
The downstream steps only required the total number of reads and the number of mutant reads. These values were isolated into columns total_num_reads
and num_mut_reads
, respectively.
mpileup_data <- mpileup_data %>% mutate(total_num_reads = as.numeric(num_high_qual_bases), num_mut_reads = str_split_fixed(allelic_depths, ",", 2)[, 2]) %>% select(file_id, start, total_num_reads, num_mut_reads) %>% filter(!(str_detect(num_mut_reads, ",") | num_mut_reads == "")) %>% mutate(num_mut_reads = as.numeric(num_mut_reads), total_num_reads = as.numeric(total_num_reads)) mpileup_data
The last step was to merge the read-depth with the ANNOVAR mutation calls.
allele_depth_tib <- left_join(annovar_tib, mpileup_data, by = c("file_id" = "file_id", "Start" = "start")) %>% left_join(tidy_tcga_sample_sheet, by = "file_id") usethis::use_data(allele_depth_tib, overwrite = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.