IPos-class: IPos objects

IPos-classR Documentation

IPos objects

Description

The IPos class is a container for storing a set of integer positions. It exists in 2 flavors: UnstitchedIPos and StitchedIPos. Each flavor uses a particular internal representation:

  • In an UnstitchedIPos instance the positions are stored as an integer vector.

  • In a StitchedIPos instance the positions are stored as an IRanges object where each range represents a run of consecutive positions (i.e. a run of positions that are adjacent and in ascending order). This storage is particularly memory-efficient when the vector of positions contains long runs of consecutive positions.

Because integer positions can be seen as integer ranges of width 1, the IPos class extends the IntegerRanges virtual class.

Usage

IPos(pos=integer(0), names=NULL, ..., stitch=NA)  # constructor function

Arguments

pos

An integer or numeric vector, or an IRanges object (or other IntegerRanges derivative). If pos is anything else, IPos() will first try to coerce it to an IRanges object with as(pos, "IRanges").

When pos is an IRanges object (or other IntegerRanges derivative), each range in it is interpreted as a run of consecutive positions.

names

A character vector or NULL.

...

Metadata columns to set on the IPos object. All the metadata columns must be vector-like objects of the same length as the object to construct.

stitch

TRUE, FALSE, or NA (the default).

Controls which internal representation should be used: StitchedIPos (when stitch is TRUE) or UnstitchedIPos (when stitch is FALSE).

When stitch is NA (the default), which internal representation will be used depends on the type of pos: UnstitchedIPos if pos is an integer or numeric vector, and StitchedIPos otherwise.

Details

Even though an IRanges object can be used for storing integer positions, using an IPos object is more efficient. In particular the memory footprint of an UnstitchedIPos object is half that of an IRanges object.

OTOH the memory footprint of a StitchedIPos object can vary a lot but will never be worse than that of an IRanges object. However it will reduce dramatically if the vector of positions contains long runs of consecutive positions. In the worst case scenario (i.e. when the object contains no consecutive positions) its memory footprint will be the same as that of an IRanges object.

Like for any Vector derivative, the length of an IPos object cannot exceed .Machine$integer.max (i.e. 2^31 on most platforms). IPos() will return an error if pos contains too many positions.

Value

An UnstitchedIPos or StitchedIPos object. If the input object pos is itself an IPos derivative, its metadata columns are propagated.

Accessors

Getters

IPos objects support the same set of getters as other IntegerRanges derivatives (i.e. length(), start(), end(), names(), mcols(), etc...), plus the pos() getter which is equivalent to start() and end(). See ?IntegerRanges for the list of getters supported by IntegerRanges derivatives.

Setters

IPos derivatives support the names(), mcols() and metadata() setters only.

In particular there is no pos() setter for IPos derivatives at the moment (although one might be added in the future).

Coercion

From UnstitchedIPos to StitchedIPos and vice-versa: coercion back and forth between UnstitchedIPos and StitchedIPos is supported via as(x, "StitchedIPos") and as(x, "UnstitchedIPos"). This is the most efficient and recommended way to switch between the 2 internal representations. Note that this switch can have dramatic consequences on memory usage so is for advanced users only. End users should almost never need to do this switch when following a typical workflow.

From IntegerRanges to UnstitchedIPos, StitchedIPos, or IPos: An IntegerRanges derivative x in which all the ranges have a width of 1 can be coerced to an UnstitchedIPos or StitchedIPos object with as(x, "UnstitchedIPos") or as(x, "StitchedIPos"), respectively. For convenience as(x, "IPos") is supported and is equivalent to as(x, "UnstitchedIPos").

From IPos to IRanges: An IPos derivative x can be coerced to an IRanges object with as(x, "IRanges"). However be aware that if x is a StitchedIPos instance, the memory footprint of the resulting object can be thousands times (or more) than that of x! See "MEMORY USAGE" in the Examples section below.

From IPos to ordinary R objects: Like with any other IntegerRanges derivative, as.character(), as.factor(), and as.data.frame() work on an IPos derivative x. Note however that as.data.frame(x) returns a data frame with a pos column (containing pos(x)) instead of the start, end, and width columns that one gets with other IntegerRanges derivatives.

Subsetting

An IPos derivative can be subsetted exactly like an IRanges object.

Concatenation

IPos derivatives can be concatenated with c() or append(). See ?c in the S4Vectors package for more information about concatenating Vector derivatives.

Splitting and Relisting

Like with an IRanges object, split() and relist() work on an IPos derivative.

Author(s)

Hervé Pagès; based on ideas borrowed from Georg Stricker georg.stricker@in.tum.de and Julien Gagneur gagneur@in.tum.de

See Also

  • The GPos class in the GenomicRanges package for representing a set of genomic positions (i.e. genomic ranges of width 1, a.k.a. genomic loci).

  • The IRanges class for storing a set of integer ranges of arbitrary width.

  • IPosRanges-comparison for comparing and ordering integer ranges and/or positions.

  • findOverlaps-methods for finding overlapping integer ranges and/or positions.

  • intra-range-methods and inter-range-methods for intra range and inter range transformations.

  • coverage-methods for computing the coverage of a set of ranges and/or positions.

  • nearest-methods for finding the nearest integer range/position neighbor.

Examples

showClass("IPos")  # shows the known subclasses

## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------

## Example 1:
ipos1a <- IPos(c(44:53, 5:10, 2:5))
ipos1a  # unstitched

length(ipos1a)
pos(ipos1a)  # same as 'start(ipos1a)' and 'end(ipos1a)'
as.character(ipos1a)
as.data.frame(ipos1a)
as(ipos1a, "IRanges")
as.data.frame(as(ipos1a, "IRanges"))
ipos1a[9:17]

ipos1b <- IPos(c(44:53, 5:10, 2:5), stitch=TRUE)
ipos1b  # stitched

## 'ipos1a' and 'ipos1b' are semantically equivalent, only their
## internal representations differ:
all(ipos1a == ipos1b)

ipos1c <- IPos(c("44-53", "5-10", "2-5"))
ipos1c  # stitched

identical(ipos1b, ipos1c)

## Example 2:
my_pos <- IRanges(c(1, 6, 12, 17), c(5, 10, 16, 20))
ipos2 <- IPos(my_pos)
ipos2  # stitched

## Example 3:
ipos3A <- ipos3B <- IPos(c("1-15000", "15400-88700"))
npos <- length(ipos3A)

mcols(ipos3A)$sample <- Rle("sA")
sA_counts <- sample(10, npos, replace=TRUE)
mcols(ipos3A)$counts <- sA_counts

mcols(ipos3B)$sample <- Rle("sB")
sB_counts <- sample(10, npos, replace=TRUE)
mcols(ipos3B)$counts <- sB_counts

ipos3 <- c(ipos3A, ipos3B)
ipos3

## ---------------------------------------------------------------------
## MEMORY USAGE
## ---------------------------------------------------------------------

## Coercion to IRanges works on a StitchedIPos object...
ipos4 <- IPos(c("1-125000", "135000-575000"))
ir4 <- as(ipos4, "IRanges")
ir4
## ... but is generally not a good idea:
object.size(ipos4)
object.size(ir4)  # 1652 times bigger than the StitchedIPos object!

## Shuffling the order of the positions impacts memory usage:
ipos4r <- rev(ipos4)
object.size(ipos4r)
ipos4s <- sample(ipos4)
object.size(ipos4s)

## If one anticipates a lot of shuffling of the positions,
## then an UnstitchedIPos object should be used instead:
ipos4b <- as(ipos4, "UnstitchedIPos")
object.size(ipos4b)  # initial size is bigger than stitched version
object.size(rev(ipos4b))  # size didn't change
object.size(sample(ipos4b))  # size didn't change

## AN IMPORTANT NOTE: In the worst situations, IPos still performs
## as good as an IRanges object.
object.size(as(ipos4r, "IRanges"))  # same size as 'ipos4r'
object.size(as(ipos4s, "IRanges"))  # same size as 'ipos4s'

## Best case scenario is when the object is strictly sorted (i.e.
## positions are in strict ascending order).
## This can be checked with:
is.unsorted(ipos4, strict=TRUE)  # 'ipos4' is strictly sorted

## ---------------------------------------------------------------------
## USING MEMORY-EFFICIENT METADATA COLUMNS
## ---------------------------------------------------------------------
## In order to keep memory usage as low as possible, it is recommended
## to use a memory-efficient representation of the metadata columns that
## we want to set on the object. Rle's are particularly well suited for
## this, especially if the metadata columns contain long runs of
## identical values. This is the case for example if we want to use an
## IPos object to represent the coverage of sequencing reads along a
## chromosome.

## Example 5:
library(pasillaBamSubset)
library(Rsamtools)  # for the BamFile() constructor function
bamfile1 <- BamFile(untreated1_chr4())
bamfile2 <- BamFile(untreated3_chr4())
ipos5 <- IPos(IRanges(1, seqlengths(bamfile1)[["chr4"]]))
library(GenomicAlignments)  # for "coverage" method for BamFile objects
cvg1 <- coverage(bamfile1)$chr4
cvg2 <- coverage(bamfile2)$chr4
mcols(ipos5) <- DataFrame(cvg1, cvg2)
ipos5

object.size(ipos5)  # lightweight

## Keep only the positions where coverage is at least 10 in one of the
## 2 samples:
ipos5[mcols(ipos5)$cvg1 >= 10 | mcols(ipos5)$cvg2 >= 10]

Bioconductor/IRanges documentation built on Nov. 17, 2024, 6:54 p.m.