peakreference: combine and merge multiple BED files

Description Usage Arguments Details Value Author(s) Examples

View source: R/peakreference.R

Description

This function merges genomic coordinates of a given data frame or reads in BED files (e.g. generated from a peak caller) under given directory and merge genomic regions that have overlapping genomic intervals into a single feature. The single feature represents the widest genomic interval that covers all merged regions.

Usage

1
2
peakreference(data = NULL, dir = NULL, pattern = NULL, merge = TRUE,
  overlap = 1, ratio = NULL)

Arguments

data

a data frame containg coordinates information of peaks to be merged. Columns of the data frame should be consistent with the BED format where the first column is the name of the chromosome, the second column is the starting position and the third column is the ending position.

dir

character string giving the directory where BED files are stored. If data is not given, the function will reads in the BED files under code.

pattern

an regular expression, only files that have names match the regular expression will be read in.

merge

logical indicating whether to merge overlapped regions or not. If False, regions are simply combined.

overlap

a numberic value giving the least number of base(s) two regions should overlap when merging them.

ratio

a numberic value giving the thresold of overlapping ratio between two regions to merge them. See 'Details' below for the definition of the overlapping ratio.

Details

The overlapping ratio (OR) is defined as:

OR = \frac{n}{\min(length(a), length(b)}

a, b are two genomic regions, n is the number of overlapping bases between region a and region b.

Value

a data frame with four columns: chr, start, stop, id

Author(s)

Mengjun Wu, Lei Gu

Examples

1
2
3
4
5
peaks <- data.frame(chr = c(rep('chr1',4),rep('chr2', 3), rep('chr3',2)),
                    start = c(100,148,230,300,330,480,1000,700,801),
                    end = c(150,220,500,450,600,900,1050,760,900))

merged_peaks <- peakreference(data = peaks, merge = TRUE, overlap = 1)

TCseq documentation built on Nov. 8, 2020, 5:46 p.m.