knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>")

happyR provides an easy framework to import hap.py results into R. The user can decide whether to upload results from a single sample, by providing a happy prefix path, or to bulk load multiple datasets via a samplesheet. In both cases, it is easy to extract relevant results using helper functions.

Set up

library(happyR)
library(tidyverse)

Loading and querying individual hap.py results

Let's first explore how to load hap.py outputs from a single sample:

# define happy_prefix (the -o argument to hap.py, here: path/to/files/happy_demo)
extdata_dir <- system.file("extdata", package = "happyR")
happy_prefix <- sprintf("%s/happy_demo", extdata_dir)

# load hap.py results
hap_result <- read_happy(happy_prefix)
class(hap_result)
names(hap_result)

hap_result is now a happy_result object with the following fields:

We can query each item in our happy_result object using standard R syntax:

hap_result$summary %>% head

hap_result$extended %>% head

names(hap_result$pr_curve)
# e.g. here pr_curve$INDEL_PASS maps to happy_demo.roc.Locations.INDEL.PASS.csv.gz
hap_result$pr_curve$INDEL_PASS %>% head

Or we can use the helper pr_data function to query pr_curve with advanced filters:

del_pr <- pr_data(hap_result, var_type = "indel", filter = "PASS", subtype = "*")
del_pr %>% head

Aggregating results from multiple samples

Often, we will want to inspect results from multiple samples, which we can achieve by passing a samplesheet to happyR:

# define happyr samplesheet
extdata_dir <- system.file("extdata", package = "happyR")
samplesheet <- readr::read_csv("group_id,replicate_id,happy_prefix
PCR-Free,NA12878-I30,NA12878-I30_S1
PCR-Free,NA12878-I33,NA12878-I33_S1
Nano,NA12878-R1,NA12878-R1_S1
Nano,NA12878-R2,NA12878-R2_S1
") %>% 
mutate(happy_prefix = sprintf("%s/%s", extdata_dir, happy_prefix))

samplesheet

# load hap.py results
hap_samplesheet <- read_samplesheet_(samplesheet)
# or directly from a samplesheet.csv
# hap_samplesheet <- read_samplesheet(samplesheet_path = "/path/to/happyr_samplesheet.csv")
class(hap_samplesheet)
names(hap_samplesheet)

hap_samplesheet is a happy_samplesheet object that contains the following fields:

We can query samplesheet and ids fields by directly accessing the relevant list items:

hap_samplesheet$samplesheet %>% head
hap_samplesheet$ids %>% head

And access aggregate hap.py results with the extract_results function:

summary <- extract_results(hap_samplesheet$results, table = "summary")
summary %>% head
# see the extract_results documentation for a list of possible values for the table argument


Illumina/happyR documentation built on July 12, 2019, 7:57 p.m.