In neurogenomics/PeakyFinders: Mining, Calling, and Importing Epigenomic Peaks in R

pkg <- read.dcf("../DESCRIPTION", fields = "Package")[1]
library(pkg, character.only = TRUE)

library(`r pkg`)

Import peaks

Pre-computed peaks

Some peaks files are already made available on GEO. import_peaks will import these as a named list GenomicRanges, where the names are the GEO sample ids.

grl <- PeakyFinders::import_peaks(
    ids = "GSM945244",
    searches = construct_searches(keys = c("narrowpeak",
                                           "broadpeak")))

Pre-computed peak: subset

By default, the genome-wide files are imported. However, you can also query specific subset of these peak files defined by query_granges.

Here, we'll just construct query for all of chromosome 22 using the helper function get_genome

query_granges <- PeakyFinders::get_genome(genome = "hg19",
                                          keep.chr = 22)

grl2 <- PeakyFinders::import_peaks(
    ids = "GSM945244",
    searches = construct_searches(keys = c("narrowpeak",
                                           "broadpeak")),
    builds = "hg19",
    query_granges = query_granges,
    query_granges_build = "hg19") 

knitr::kable(
    head(data.frame(grl2$GSM945244))
)

Call peaks

When no pre-computed peaks are available, you can instead use import_peaks to compute peaks from bigWig or bedGraph files. The default peak calling method is currently MACS3 via the MACSr package.

Note:

By default, peaks are called using whole-chromosomes, from which a subset specified by query_granges is returned. This ensures that an appropriate background is used when computing peaks. You can also call peaks using the whole genome at once, but this is usally less computationally efficient than splitting the task by chromosome (i.e. setting split_chromosomes=TRUE when query_granges is not provided).

By default, import_peaks (and the subfunction call_peaks) automatically infer a reasonable cutoff threshold based on an iterative testing procedure. See ?PeakyFinders::call_peaks for more details.

Warning:

Peak calling with MACS3/MACSr is not currently compatible with Windows OS.

### rtracklayer::import.bw() is currently broken on Windows.
if(.Platform$OS.type!="windows"){

    grl3 <- PeakyFinders::import_peaks(
    ids = "GSM945244",
    searches = construct_searches(keys = "bigwig"),
    builds = "hg19",
    query_granges = query_granges,
    query_granges_build = "hg19"
    ) 

    knitr::kable(
        head(data.frame(grl3$GSM945244))
    )
}

Parallelization

To speed up the process of importing and/or calling peaks, you can take advantage of the parallelisation features in import_peaks.

For example, if you want to call peaks across the entire genome from a given bigWig file, you could speed up this process by dividing the task by chromosome (split_chromosomes=TRUE) and distributing it across multiple cores or compute nodes (nThread=2 or greater).

Internally, PeakyFinders takes a number of steps to optimise these parallel processes, so you don't have to worry about it as much. This helps to avoid common issues stemming from excessive numbers of queries, both to the servers where the peak data is hosted, and to the C libraries that R relies on to makes these queries.

Note:

It's usually a good idea to leave several cores free when running parallel processes, so that you can still have enough resources to use your computer while the function is running. In the example below, I'm using 10/12 cores on my local laptop.

Warning:

Parallelisation is not currently compatible with Windows OS. If Windows OS is detected, import_peaks will automatically switch to serial processing with nThread=1.

grl4 <- PeakyFinders::import_peaks(
    ids = "GSM945244",
    searches = construct_searches(keys = "bigwig"),
    split_chromosomes = TRUE,
    nThread = 10
    )