extractListFragments: Extract list fragments from a list-like object

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/extractListFragments.R

Description

Utilities for extracting list fragments from a list-like object.

Usage

1
2
3
4
extractListFragments(x, aranges, use.mcols=FALSE,
                     msg.if.incompatible=INCOMPATIBLE_ARANGES_MSG)

equisplit(x, nchunk, chunksize, use.mcols=FALSE)

Arguments

x

The list-like object from which to extract the list fragments.

Can be any List derivative for extractListFragments. Can also be an ordinary list if extractListFragments is called with use.mcols=TRUE.

Can be any List derivative that supports relist() for equisplit.

aranges

An IntegerRanges derivative containing the absolute ranges (i.e. the ranges along unlist(x)) of the list fragments to extract.

The ranges in aranges must be compatible with the cumulated length of all the list elements in x, that is, start(aranges) and end(aranges) must be >= 1 and <= sum(elementNROWS(x)), respectively.

Also please note that only IntegerRanges objects that are disjoint and sorted are supported at the moment.

use.mcols

Whether to propagate the metadata columns on x (if any) or not.

Must be TRUE or FALSE (the default). If set to FALSE, instead of having the metadata columns propagated from x, the object returned by extractListFragments has metadata columns revmap and revmap2, and the object returned by equisplit has metadata column revmap. Note that this is the default.

msg.if.incompatible

The error message to use if aranges is not compatible with the cumulated length of all the list elements in x.

nchunk

The number of chunks. Must be a single positive integer.

chunksize

The size of the chunks (last chunk might be smaller). Must be a single positive integer.

Details

A list fragment of list-like object x is a window in one of its list elements.

extractListFragments is a low-level utility that extracts list fragments from list-like object x according to the absolute ranges in aranges.

equisplit fragments and splits list-like object x into a specified number of partitions with equal (total) width. This is useful for instance to ensure balanced loading of workers in parallel evaluation. For example, if x is a GRanges object, each partition is also a GRanges object and the set of all partitions is returned as a GRangesList object.

Value

An object of the same class as x for extractListFragments.

An object of class relistToClass(x) for equisplit.

Author(s)

Hervé Pagès

See Also

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
## ---------------------------------------------------------------------
## A. extractListFragments()
## ---------------------------------------------------------------------

x <- IntegerList(a=101:109, b=5:-5)
x

aranges <- IRanges(start=c(2, 4, 8, 17, 17), end=c(3, 6, 14, 16, 19))
aranges
extractListFragments(x, aranges)

x2 <- IRanges(c(1, 101, 1001, 10001), width=c(10, 5, 0, 12),
              names=letters[1:4])
mcols(x2)$label <- LETTERS[1:4]
x2

aranges <- IRanges(start=13, end=20)
extractListFragments(x2, aranges)
extractListFragments(x2, aranges, use.mcols=TRUE)

aranges2 <- PartitioningByWidth(c(3, 9, 13, 0, 2))
extractListFragments(x2, aranges2)
extractListFragments(x2, aranges2, use.mcols=TRUE)

x2b <- as(x2, "IntegerList")
extractListFragments(x2b, aranges2)

x2c <- as.list(x2b)
extractListFragments(x2c, aranges2, use.mcols=TRUE)

## ---------------------------------------------------------------------
## B. equisplit()
## ---------------------------------------------------------------------

## equisplit() first calls breakInChunks() internally to create a
## PartitioningByWidth object that contains the absolute ranges of the
## chunks, then calls extractListFragments() on it 'x' to extract the
## fragments of 'x' that correspond to these absolute ranges. Finally
## the IRanges object returned by extractListFragments() is split into
## an IRangesList object where each list element corresponds to a chunk.
equisplit(x2, nchunk=2)
equisplit(x2, nchunk=2, use.mcols=TRUE)

equisplit(x2, chunksize=5)

library(GenomicRanges)
gr <- GRanges(c("chr1", "chr2"), IRanges(1, c(100, 1e5)))
equisplit(gr, nchunk=2)
equisplit(gr, nchunk=1000)

## ---------------------------------------------------------------------
## C. ADVANCED extractListFragments() EXAMPLES
## ---------------------------------------------------------------------

## === D1. Fragment list-like object into length 1 fragments ===

## First we construct a Partitioning object where all the partitions
## have a width of 1:
x2_cumlen <- nobj(PartitioningByWidth(x2))  # Equivalent to
                                            # length(unlist(x2)) except
                                            # that it doesn't unlist 'x2'
                                            # so is much more efficient.
aranges1 <- PartitioningByEnd(seq_len(x2_cumlen))
aranges1

## Then we use it to fragment 'x2':
extractListFragments(x2, aranges1)
extractListFragments(x2b, aranges1)
extractListFragments(x2c, aranges1, use.mcols=TRUE)

## === D2. Fragment a Partitioning object ===

partitioning2 <- PartitioningByEnd(x2b)  # same as PartitioningByEnd(x2)
extractListFragments(partitioning2, aranges2)

## Note that when the 1st arg is a Partitioning derivative, then
## swapping the 1st and 2nd elements in the call to extractListFragments()
## doesn't change the returned partitioning:
extractListFragments(aranges2, partitioning2)

## ---------------------------------------------------------------------
## D. SANITY CHECKS
## ---------------------------------------------------------------------

## If 'aranges' is 'PartitioningByEnd(x)' or 'PartitioningByWidth(x)'
## and 'x' has no zero-length list elements, then
## 'extractListFragments(x, aranges, use.mcols=TRUE)' is a no-op.
check_no_ops <- function(x) {
  aranges <- PartitioningByEnd(x)
  stopifnot(identical(
    extractListFragments(x, aranges, use.mcols=TRUE), x
  ))
  aranges <- PartitioningByWidth(x)
  stopifnot(identical(
    extractListFragments(x, aranges, use.mcols=TRUE), x
  ))
}

check_no_ops(x2[lengths(x2) != 0])
check_no_ops(x2b[lengths(x2b) != 0])
check_no_ops(x2c[lengths(x2c) != 0])
check_no_ops(gr)

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

IntegerList of length 2
[["a"]] 101 102 103 104 105 106 107 108 109
[["b"]] 5 4 3 2 1 0 -1 -2 -3 -4 -5
IRanges object with 5 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         2         3         2
  [2]         4         6         3
  [3]         8        14         7
  [4]        17        16         0
  [5]        17        19         3
IntegerList of length 6
[["a"]] 102 103
[["a"]] 104 105 106
[["a"]] 108 109
[["b"]] 5 4 3 2 1
[["b"]] integer(0)
[["b"]] -2 -3 -4
IRanges object with 4 ranges and 1 metadata column:
        start       end     width |       label
    <integer> <integer> <integer> | <character>
  a         1        10        10 |           A
  b       101       105         5 |           B
  c      1001      1000         0 |           C
  d     10001     10012        12 |           D
IRanges object with 3 ranges and 2 metadata columns:
        start       end     width |    revmap   revmap2
    <integer> <integer> <integer> | <integer> <integer>
  b       103       105         3 |         2         1
  c      1001      1000         0 |         3         1
  d     10001     10005         5 |         4         1
IRanges object with 3 ranges and 1 metadata column:
        start       end     width |       label
    <integer> <integer> <integer> | <character>
  b       103       105         3 |           B
  c      1001      1000         0 |           C
  d     10001     10005         5 |           D
IRanges object with 8 ranges and 2 metadata columns:
        start       end     width |    revmap   revmap2
    <integer> <integer> <integer> | <integer> <integer>
  a         1         3         3 |         1         1
  a         4        10         7 |         1         2
  b       101       102         2 |         2         2
  b       103       105         3 |         2         3
  c      1001      1000         0 |         3         3
  d     10001     10010        10 |         4         3
  d     10011     10010         0 |         4         4
  d     10011     10012         2 |         4         5
IRanges object with 8 ranges and 1 metadata column:
        start       end     width |       label
    <integer> <integer> <integer> | <character>
  a         1         3         3 |           A
  a         4        10         7 |           A
  b       101       102         2 |           B
  b       103       105         3 |           B
  c      1001      1000         0 |           C
  d     10001     10010        10 |           D
  d     10011     10010         0 |           D
  d     10011     10012         2 |           D
IntegerList of length 8
[["a"]] 1 2 3
[["a"]] 4 5 6 7 8 9 10
[["b"]] 101 102
[["b"]] 103 104 105
[["c"]] integer(0)
[["d"]] 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010
[["d"]] integer(0)
[["d"]] 10011 10012
$a
[1] 1 2 3

$a
[1]  4  5  6  7  8  9 10

$b
[1] 101 102

$b
[1] 103 104 105

$c
integer(0)

$d
 [1] 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010

$d
integer(0)

$d
[1] 10011 10012

IRangesList of length 2
[[1]]
IRanges object with 2 ranges and 1 metadata column:
        start       end     width |    revmap
    <integer> <integer> <integer> | <integer>
  a         1        10        10 |         1
  b       101       103         3 |         2

[[2]]
IRanges object with 3 ranges and 1 metadata column:
        start       end     width |    revmap
    <integer> <integer> <integer> | <integer>
  b       104       105         2 |         2
  c      1001      1000         0 |         3
  d     10001     10012        12 |         4

IRangesList of length 2
[[1]]
IRanges object with 2 ranges and 1 metadata column:
        start       end     width |       label
    <integer> <integer> <integer> | <character>
  a         1        10        10 |           A
  b       101       103         3 |           B

[[2]]
IRanges object with 3 ranges and 1 metadata column:
        start       end     width |       label
    <integer> <integer> <integer> | <character>
  b       104       105         2 |           B
  c      1001      1000         0 |           C
  d     10001     10012        12 |           D

IRangesList of length 6
[[1]]
IRanges object with 1 range and 1 metadata column:
        start       end     width |    revmap
    <integer> <integer> <integer> | <integer>
  a         1         5         5 |         1

[[2]]
IRanges object with 1 range and 1 metadata column:
        start       end     width |    revmap
    <integer> <integer> <integer> | <integer>
  a         6        10         5 |         1

[[3]]
IRanges object with 1 range and 1 metadata column:
        start       end     width |    revmap
    <integer> <integer> <integer> | <integer>
  b       101       105         5 |         2

...
<3 more elements>
Loading required package: GenomeInfoDb
GRangesList object of length 2:
[[1]] 
GRanges object with 2 ranges and 1 metadata column:
      seqnames    ranges strand |    revmap
         <Rle> <IRanges>  <Rle> | <integer>
  [1]     chr1     1-100      * |         1
  [2]     chr2   1-49950      * |         2

[[2]] 
GRanges object with 1 range and 1 metadata column:
      seqnames       ranges strand | revmap
  [1]     chr2 49951-100000      * |      2

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
GRangesList object of length 1000:
[[1]] 
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |    revmap
         <Rle> <IRanges>  <Rle> | <integer>
  [1]     chr1     1-100      * |         1

[[2]] 
GRanges object with 1 range and 1 metadata column:
      seqnames ranges strand | revmap
  [1]     chr2  1-100      * |      2

[[3]] 
GRanges object with 1 range and 1 metadata column:
      seqnames  ranges strand | revmap
  [1]     chr2 101-200      * |      2

...
<997 more elements>
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
PartitioningByEnd object with 27 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]         1         1         1
   [2]         2         2         1
   [3]         3         3         1
   [4]         4         4         1
   [5]         5         5         1
   ...       ...       ...       ...
  [23]        23        23         1
  [24]        24        24         1
  [25]        25        25         1
  [26]        26        26         1
  [27]        27        27         1
IRanges object with 27 ranges and 2 metadata columns:
        start       end     width |    revmap   revmap2
    <integer> <integer> <integer> | <integer> <integer>
  a         1         1         1 |         1         1
  a         2         2         1 |         1         2
  a         3         3         1 |         1         3
  a         4         4         1 |         1         4
  a         5         5         1 |         1         5
  .       ...       ...       ... .       ...       ...
  d     10008     10008         1 |         4        23
  d     10009     10009         1 |         4        24
  d     10010     10010         1 |         4        25
  d     10011     10011         1 |         4        26
  d     10012     10012         1 |         4        27
IntegerList of length 27
[["a"]] 1
[["a"]] 2
[["a"]] 3
[["a"]] 4
[["a"]] 5
[["a"]] 6
[["a"]] 7
[["a"]] 8
[["a"]] 9
[["a"]] 10
...
<17 more elements>
$a
[1] 1

$a
[1] 2

$a
[1] 3

$a
[1] 4

$a
[1] 5

$a
[1] 6

$a
[1] 7

$a
[1] 8

$a
[1] 9

$a
[1] 10

$b
[1] 101

$b
[1] 102

$b
[1] 103

$b
[1] 104

$b
[1] 105

$d
[1] 10001

$d
[1] 10002

$d
[1] 10003

$d
[1] 10004

$d
[1] 10005

$d
[1] 10006

$d
[1] 10007

$d
[1] 10008

$d
[1] 10009

$d
[1] 10010

$d
[1] 10011

$d
[1] 10012

PartitioningByEnd object with 8 ranges and 2 metadata columns:
        start       end     width |    revmap   revmap2
    <integer> <integer> <integer> | <integer> <integer>
  a         1         3         3 |         1         1
  a         4        10         7 |         1         2
  b        11        12         2 |         2         2
  b        13        15         3 |         2         3
  c        16        15         0 |         3         3
  d        16        25        10 |         4         3
  d        26        25         0 |         4         4
  d        26        27         2 |         4         5
PartitioningByWidth object with 8 ranges and 2 metadata columns:
          start       end     width |    revmap   revmap2
      <integer> <integer> <integer> | <integer> <integer>
  [1]         1         3         3 |         1         1
  [2]         4        10         7 |         2         1
  [3]        11        12         2 |         2         2
  [4]        13        15         3 |         3         2
  [5]        16        15         0 |         3         3
  [6]        16        25        10 |         3         4
  [7]        26        25         0 |         4         4
  [8]        26        27         2 |         5         4

IRanges documentation built on Dec. 14, 2020, 2 a.m.