library(magrittr) knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = TRUE)
Aim: Check the speed of different tools for loading coverage.
There are multiple tools that are able to obtain coverage from BigWig/BAM files. Specifically, here we test the tools megadepth, rtracklayer and pyBigWig. Some operations requiring coverage can be very computationally intensive - for example, the dasper R package requires loading coverage across 3 regions per junction per sample. Here, we aim to find the fastest tool for loading coverage from BigWig files by comparing the speed of the megadepth vs rtracklayer vs pyBigwig.
First off, we obtain paths to the annotation ranges and BigWig files to load coverage from. For these, we have used the examples hosted on https://github.com/ChristopherWilks/megadepth.
dasper package. These are example junctions that# extract only the required ranges annotation <- dasper::junctions_example %>% SummarizedExperiment::rowRanges() # keep only ranges on chr 21 GenomeInfoDb::seqlevels(annotation, pruning.mode = "coarse") <- "21" # add chr to ranges to match bw GenomeInfoDb::seqlevels(annotation) <- paste0("chr", GenomeInfoDb::seqlevels(annotation)) # take the first 1000 ranges for testing annotation <- annotation[1:5000] annotation_path <- file.path(tempdir(), "annotation_example.txt") rtracklayer::export(annotation, con = annotation_path, format = "BED")
url <- recount::download_study(project = "SRP012682", type = "samples", download = FALSE) set.seed(32) # select 1 random sample bw_remote <- sample(url, 1) bw_local <- file.path(tempdir(), "example_gtex_bw.bw") download.file(bw_remote, bw_local)
rtracklayer match those obtained through megadepth. # get base-wise coverage as numeric list for summarisation after rt_cov <- rtracklayer::import.bw(con = bw_local, which = annotation, as = "NumericList") md_cov <- megadepth::get_coverage(bigwig_file = bw_local, annotation = annotation_path, op = "mean") # round here to match 2 digits returned by megadepth try(testthat::expect_equivalent(rt_cov %>% BiocGenerics::mean() %>% round(2), md_cov$cov))
rtracklayer, megadepth and pyBigwig when obtaining coverage from a BigWig file. To make sure the results are comparable, we will use the identical set of ranges and BigWigs initialised above. rtracklayer and megadepth can be easily called in R therefore for testing the runtime of these tools I will use the R package microbenchmark.n_iters <- 10 benchmark_rt_md <- function(bw, times = 10){ runtime <- microbenchmark::microbenchmark( "rt"= { rtracklayer::import.bw(con = bw, which = annotation, as = "NumericList") }, "md" = { megadepth::get_coverage(bigwig_file = bw, annotation = annotation_path, op = "mean") }, times = times) runtime <- runtime %>% dplyr::mutate(time = time / 1e9) %>% # convert time to secs dplyr::rename(tool = expr, run_time = time) return(runtime) } runtime_rt_md <- rbind(benchmark_rt_md(bw = bw_local, times = n_iters) %>% dplyr::mutate(local_remote = "local"), benchmark_rt_md(bw = bw_remote, times = n_iters) %>% dplyr::mutate(local_remote = "remote"))
pyBigWig is a python package, to ensure the paths to the BigWig and annotation ranges remain the same, I will call the script found r here::here("analysis", "test_pyBigWig.py"). This creates a function that calls pyBigWig for a set number of iterations and saves the runtime for each.pyBigWig, pandas and numpy installed. benchmark_pybw <- function(bw, times = 10){ out_path <- paste0(tempdir(), "/pyBigWig_run_time.txt") system2("python3", args = c(here::here("analysis", "test_pyBigWig.py"), bw, annotation_path, out_path, times) ) runtime <- readr::read_delim(out_path, delim = "\t") return(runtime) } runtime_pybw <- rbind(benchmark_pybw(bw = bw_local, times = n_iters) %>% dplyr::mutate(local_remote = "local"), benchmark_pybw(bw = bw_remote, times = n_iters) %>% dplyr::mutate(local_remote = "remote"))
comp_time <- runtime_rt_md %>% as.data.frame() %>% rbind(runtime_pybw) %>% dplyr::mutate(tool = tool %>% factor(c("md", "rt", "pyBigwig"))) comp_time_plot <- ggplot2::ggplot(comp_time, ggplot2::aes(x = tool, y = run_time, colour = local_remote)) + ggplot2::geom_boxplot() + ggplot2::geom_jitter() + ggplot2::scale_x_discrete(name = "Tool", labels = c("megadepth", "rtracklayer", "pyBigWig")) + ggplot2::scale_y_continuous(name = paste0("Run time for ", length(annotation), " regions (seconds)")) + ggplot2::scale_colour_manual(values = ggpubr::get_palette("npg", 2), guide = "none") + ggplot2::facet_wrap(~ local_remote, scales = "free") + ggplot2::theme_bw(base_size = 20) comp_time_plot %>% ggplot2::ggsave(filename = "md_rt_pybw_runtime.png", plot = ., path = here::here("analysis"), dpi = 300, width = 10, height = 5)
## Session info library("sessioninfo") options(width = 120) session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.