getOverlaps: Identifies overlaps between two sets of genomic coordinates

Description Usage Arguments Details Value Author(s) Examples

View source: R/getOverlaps.R

Description

This function identifies which of a set of genomic segments overlaps with another set of coordinates; either with partial overlap or with the segments completely contained within the coordinates. The function is used within the ‘segmentSeq’ package for various methods of constructing a segmentation map, but may also be useful in downstream analysis (e.g. annotation analyses).

Usage

1
2
getOverlaps(coordinates, segments, overlapType = "overlapping",
whichOverlaps = TRUE, ignoreStrand = FALSE, cl)

Arguments

coordinates

A GRanges object defining the set of coordinates with which the segments may overlap.

segments

A GRanges object defining the set of segments which may overlap within the coordinates.

overlapType

Which kind of overlaps are being sought? Can be one of ‘overlapping’, ‘contains’ or ‘within’. See Details.

whichOverlaps

If TRUE, returns the ‘segments’ overlapping with the ‘coordinates’. If FALSE, returns a boolean vector specifying which of the ‘coordinates’ overlap with the ‘segments’.

ignoreStrand

If TRUE, a segment may overlap a set of coordinates regardless of the strand the two are on. If FALSE, overlaps will only be identified if both are on the same strand (or if either has no strand specified). Defaults to FALSE.

cl

A SNOW cluster object, or NULL. See Details.

Details

If overlapType = "overlapping" then any overlap between the ‘coordinates’ and the ‘segments’ is sufficient. If overlapType = "contains" then a region defined in ‘coordinates’ must completely contain at least one of the ‘segments’ to count as an overlap. If overlapType = "within" then a region defined in ‘coordinates’ must be completely contained by at least one of the ‘segments’ to count as an overlap.

A 'cluster' object (package: snow) may usefully be used for parallelisation of this function when examining large data sets. Passing NULL to this variable will cause the function to run in non-parallel mode.

Value

If whichOverlaps = TRUE, then the function returns a list object with length equal to the number of rows of the ‘coordinates’ argument. The ‘i’th member of the list will be a numeric vector giving the row numbers of the ‘segments’ object which overlap with the ‘i’th row of the ‘coordinates’ object, or NA if no segments overlap with this coordinate region.

If whichOverlaps = FALSE, then the function returns a boolean vector with length equal to the number of rows of the ‘coordinates’ argument, indicating which of the regions defined in coordinates have the correct type of overlap with the ‘segments’.

Author(s)

Thomas J. Hardcastle

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Define the chromosome lengths for the genome of interest.

chrlens <- c(2e6, 1e6)

# Define the files containing sample information.

datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")

# Establish the library names and replicate structure.

libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)

# Process the files to produce an `alignmentData' object.

alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
chrlens, gap = 100)

# Find which tags overlap with an arbitrary set of coordinates.

getOverlaps(coordinates = GRanges(seqnames = c(">Chr1"),
          IRanges(start = c(1,100,2000), end = c(40,3000,5000))),
          segments = alignData@alignments, overlapType = "overlapping",
          whichOverlaps = TRUE, cl = NULL)

segmentSeq documentation built on Nov. 8, 2020, 5:18 p.m.