import_peaks: Import peaks
In neurogenomics/PeakyFinders: Mining, Calling, and Importing Epigenomic Peaks in R

import_peaks

R Documentation

Import peaks

Description

Import pre-computed peak files, or compute new peaks from bedGraph/bigWig files. Can import a subset of ranges specified by query_granges, or across the whole genome by setting query_granges=NULL.
Currently recognizes IDs from:

GEO :
ENCODE : See peaks_metadata_encode for example metadata.
ROADMAP : See peaks_metadata_roadmap for example metadata.
AnnotationHub : See peaks_metadata_annotationhub for example metadata.

Notable features:

Automatically infers which database each accession ID is from and organizes the outputs accordingly.
Automatically infers which function is needed to import which file types.
Automatically calls peaks from any bedGraph/bigWig files.
query_granges can be a different genome build than the files being imported, as the query_granges will be lifted over to the correct genome build with liftover_grlist.
When nThread>1, accelerates file importing and peak calling using multi-core parallelisation.

Usage

import_peaks(
  ids,
  builds = "hg19",
  query_granges = NULL,
  query_granges_build = NULL,
  split_chromosomes = FALSE,
  condense_queries = TRUE,
  force_new = FALSE,
  method = "MACSr",
  cutoff = NULL,
  searches = construct_searches(),
  peaks_dir = tempdir(),
  save_path = tempfile(fileext = "_PeakyFinders_grl.rds"),
  nThread = 1,
  verbose = TRUE
)

Arguments

`ids`	IDs from one of the supported databases. IDs can be at any level: file, sample, or experiment.
`builds`	Genome build that each sample in `ids` is aligned to. This will determine whether whether the `query_granges` data need to be lifted over to different genome build before querying. Can be a single character string applied to all `ids` (e.g. "hg19"), or a vector of the same length as `ids` named using the `ids` (e.g. c("GSM4271282"="hg19", "ENCFF048VDO"="hg38")).
`query_granges`	[Optional] GRanges object indicating which genomic regions to extract from each sample.
`query_granges_build`	[Optional] Genome build that `query_granges` is aligned to.
`split_chromosomes`	Split single-threaded query into multi-threaded query across chromosomes. This is can be helpful especially when calling peaks from large bigWig/bedGraph files. The number of threads used is set by the `nThread` argument.
`condense_queries`	Condense `query_granges` by taking the min/max position per chromosome (default: `TRUE)`. This helps to reduce the total number of queries, which can cause memory allocation problems due to repeated calls to the underlying C libraries.
`force_new`	By default, saved results of the same `save_path` name will be imported instead of running queries. However you can override this by setting `force_new` to perform new queries regardless and overwrite the old `save_path` file.
`method`	Method to call peaks with: "MACSr" : Uses MACS3 via bdgpeakcall. "SEACR" : Uses SEACR via find_packages.
`cutoff`	when `method="MACSr"` : Passed to `cutoff` argument. Cutoff depends on which method you used for score track. If the file contains pvalue scores from MACS3, score 5 means pvalue 1e-5. If `NULL`, a reasonable `cutoff` value will be inferred through a `cutoff_analysis`. when `method="SEACR"` : Passed to `control` argument. Control (IgG) data bedgraph file to generate an empirical threshold for peak calling. Alternatively, a numeric threshold n between 0 and 1 returns the top n fraction of peaks based on total signal within peaks (default: `0.05`).
`searches`	Named list of regex queries.
`peaks_dir`	Directory to save peaks to (only used when calling peaks from bedGraph files).
`save_path`	Path to save query results to in .rds format.
`nThread`	When `nThread>1`, accelerates file importing and peak calling using multi-core parallelisation.
`verbose`	Print messages.

Value

A nested named list of peak files in GRanges format. Nesting structure is as follows: database -> id -> GRanges object Each GRanges object contains all the peak data that was found for that particular id, merged into one. You can differentiate the various source file types by looking at the column "peaktype". If peaks could not be recovered for a sample, that element will be set to NULL.

Examples

out_list <- PeakyFinders::import_peaks(
    ids = c("GSM945244"),# "ENCSR000AHD"
    searches = PeakyFinders::construct_searches(keys = "narrowpeak"))

neurogenomics/PeakyFinders documentation built on Oct. 14, 2024, 3:09 p.m.