peakreference: combine and merge multiple BED files

View source: R/peakreference.R

peakreferenceR Documentation

combine and merge multiple BED files

Description

This function merges overlapping genomic regions into a single feature. The merged single feature represents the widest genomic interval that covers all overlapping regions.

Usage

peakreference(
  data = NULL,
  dir = NULL,
  pattern = NULL,
  merge = TRUE,
  overlap = 1,
  ratio = NULL
)

Arguments

data

a data frame containg coordinates information of peaks to be merged. Columns of the data frame should be consistent with the BED format where the first column contains chromosome information, the second column the starting position, and the third column the ending position.

dir

a character string giving the directory where BED files are stored. If data is not given, the function will reads in the BED files under code.

pattern

an regular expression, only files that have names match the regular expression will be read in.

merge

logical indicating whether to merge overlapped regions or not. If False, regions are simply combined.

overlap

a numberic value giving the least number of base(s) two regions should overlap when merging them.

ratio

a numberic value giving the thresold of overlapping ratio between two regions to merge them. See 'Details' below for the definition of the overlapping ratio.

Details

The overlapping ratio (OR) is defined as:

OR = \frac{n}{\min(length(a), length(b)}

a, b are two genomic regions, n is the number of overlapping bases between region a and region b.

Value

a data frame with four columns: chr, start, stop, id

Author(s)

Mengjun Wu, Lei Gu

Examples

peaks <- data.frame(chr = c(rep('chr1',4),rep('chr2', 3), rep('chr3',2)),
                    start = c(100,148,230,300,330,480,1000,700,801),
                    end = c(150,220,500,450,600,900,1050,760,900))

merged_peaks <- peakreference(data = peaks, merge = TRUE, overlap = 1)


MengjunWu/TCseq documentation built on May 15, 2023, 9:47 p.m.