knitr::opts_chunk$set( collapse = TRUE, comment = "#>")
happyR provides an easy framework to import hap.py results into R. The user can decide whether to upload results from a single sample, by providing a happy prefix path, or to bulk load multiple datasets via a samplesheet. In both cases, it is easy to extract relevant results using helper functions.
library(happyR) library(tidyverse)
Let's first explore how to load hap.py outputs from a single sample:
# define happy_prefix (the -o argument to hap.py, here: path/to/files/happy_demo) extdata_dir <- system.file("extdata", package = "happyR") happy_prefix <- sprintf("%s/happy_demo", extdata_dir) # load hap.py results hap_result <- read_happy(happy_prefix) class(hap_result) names(hap_result)
hap_result
is now a happy_result
object with the following fields:
summary
(from summary.csv
) - a data.frame with high-level ALL / PASS numbersextended
(from extended.csv
) - a data.frame with region / subtype stratified metricspr_curve
(from roc.*.csv.gz
) - a happy_roc
object (a list of data.frames) containing precision-recall over quality scoreWe can query each item in our happy_result
object using standard R syntax:
hap_result$summary %>% head hap_result$extended %>% head names(hap_result$pr_curve) # e.g. here pr_curve$INDEL_PASS maps to happy_demo.roc.Locations.INDEL.PASS.csv.gz hap_result$pr_curve$INDEL_PASS %>% head
Or we can use the helper pr_data
function to query pr_curve
with advanced filters:
del_pr <- pr_data(hap_result, var_type = "indel", filter = "PASS", subtype = "*") del_pr %>% head
Often, we will want to inspect results from multiple samples, which we can achieve by passing a samplesheet to happyR:
# define happyr samplesheet extdata_dir <- system.file("extdata", package = "happyR") samplesheet <- readr::read_csv("group_id,replicate_id,happy_prefix PCR-Free,NA12878-I30,NA12878-I30_S1 PCR-Free,NA12878-I33,NA12878-I33_S1 Nano,NA12878-R1,NA12878-R1_S1 Nano,NA12878-R2,NA12878-R2_S1 ") %>% mutate(happy_prefix = sprintf("%s/%s", extdata_dir, happy_prefix)) samplesheet # load hap.py results hap_samplesheet <- read_samplesheet_(samplesheet) # or directly from a samplesheet.csv # hap_samplesheet <- read_samplesheet(samplesheet_path = "/path/to/happyr_samplesheet.csv") class(hap_samplesheet) names(hap_samplesheet)
hap_samplesheet
is a happy_samplesheet
object that contains the following fields:
samplesheet
: a data.frame with the original samplesheetresults
: a happy_result_list
that contains individual happy_result
objectsids
: a vector of result idsWe can query samplesheet
and ids
fields by directly accessing the relevant list items:
hap_samplesheet$samplesheet %>% head hap_samplesheet$ids %>% head
And access aggregate hap.py results with the extract_results
function:
summary <- extract_results(hap_samplesheet$results, table = "summary") summary %>% head # see the extract_results documentation for a list of possible values for the table argument
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.