BigWigFile: BigWig Import and Export

Description Usage Arguments Value BigWigFile objects BigWigFileList objects Author(s) See Also Examples

View source: R/bigWig.R

Description

These functions support the import and export of the UCSC BigWig format, a compressed, binary form of WIG/BEDGraph with a spatial index and precomputed summaries. These functions do not work on Windows.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## S4 method for signature 'BigWigFile,ANY,ANY'
import(con, format, text,
                   selection = BigWigSelection(which, ...),
                   which = con,
                   as = c("GRanges", "RleList", "NumericList"), ...)
import.bw(con, ...)

## S4 method for signature 'ANY,BigWigFile,ANY'
export(object, con, format, ...)
## S4 method for signature 'GenomicRanges,BigWigFile,ANY'
export(object, con, format,
                   dataFormat = c("auto", "variableStep", "fixedStep",
                     "bedGraph"), compress = TRUE, fixedSummaries = FALSE)
export.bw(object, con, ...)

Arguments

con

A path, URL or BigWigFile object. Connections are not supported. For the functions ending in .bw, the file format is indicated by the function name. For the export and import methods, the format must be indicated another way. If con is a path, or URL, either the file extension or the format argument needs to be “bigWig” or “bw”.

object

The object to export, should be an RleList, IntegerList, NumericList, GRanges or something coercible to a GRanges.

format

If not missing, should be “bigWig” or “bw” (case insensitive).

text

Not supported.

as

Specifies the class of the return object. Default is GRanges, which has one range per range in the file, and a score column holding the value for each range. For NumericList, one numeric vector is returned for each range in the selection argument. For RleList, there is one Rle per sequence, and that Rle spans the entire sequence.

selection

A BigWigSelection object indicating the ranges to load.

which

A range data structure coercible to IntegerRangesList, like a GRanges, or a BigWigFile. Only the intervals in the file overlapping the given ranges are returned. By default, the value is the BigWigFile itself. Its Seqinfo object is extracted and coerced to a IntegerRangesList that represents the entirety of the file.

dataFormat

Probably best left to “auto”. Exists only for historical reasons.

compress

If TRUE, compress the data. No reason to change this.

fixedSummaries

If TRUE, compute summaries at fixed resolutions corresponding to the default zoom levels in the Ensembl genome browser (with some extrapolation): 30X, 65X, 130X, 260X, 450X, 648X, 950X, 1296X, 4800X, 19200X. Otherwise, the resolutions are dynamically determined by an algorithm that computes an initial summary size by initializing to 10X the size of the smallest feature and doubling the size as needed until the size of the summary is less than half that of the data (or there are no further gains). It then computes up to 10 more levels of summary, quadrupling the size each time, until the summaries start to exceed the sequence size.

...

Arguments to pass down to methods to other methods. For import, the flow eventually reaches the BigWigFile method on import.

Value

A GRanges (default), RleList or NumericList. GRanges return ranges with non-zero score values in a score metadata column. The length of the NumericList is the same length as the selection argument (one list element per range). The return order in the NumericList matches the order of the BigWigSelection object.

BigWigFile objects

A BigWigFile object, an extension of RTLFile is a reference to a BigWig file. To cast a path, URL or connection to a BigWigFile, pass it to the BigWigFile constructor.

BigWig files are more complex than most track files, and there are a number of methods on BigWigFile for accessing the additional information:

seqinfo(x): Gets the Seqinfo object indicating the lengths of the sequences for the intervals in the file. No circularity or genome information is available.

summary(ranges = as(seqinfo(object), "GenomicRanges"), size = 1L, type = c("mean", "min", "max", "coverage", "sd"), defaultValue = NA_real_), as = c("GRangesList", "RleList", "matrix"), ...: Aggregates the intervals in the file that fall into ranges, which should be something coercible to GRanges. The aggregation essentially compresses each sequence to a length of size. The algorithm is specified by type; available algorithms include the mean, min, max, coverage (percent sequence covered by at least one feature), and standard deviation. When a window contains no features, defaultValue is assumed. The result type depends on as, and can be a GRangesList, RleList or matrix, where the number elements (or rows) is equal to the length of ranges. For as="matrix", there must be one unique value of size, which is equal to the number of columns in the result. The as="matrix" case is the only one that supports a size greater than the width of the corresponding element in ranges, where values are interpolated to yield the matrix result. The driving use case for this is visualization of coverage when the screen space is small compared to the viewed portion of the sequence. The operation is very fast, as it leverages cached multi-level summaries present in every BigWig file.

If a summary statistic is not available / cannot be computed for a given range a warning is thrown and the defaultValue NA_real_ is returned.

When accessing remote data, the UCSC library caches data in the ‘/tmp/udcCache’ directory. To clean the cache, call cleanBigWigCache(maxDays), where any files older than maxDays days old will be deleted.

BigWigFileList objects

A BigWigFileList() provides a convenient way of managing a list of BigWigFile instances.

Author(s)

Michael Lawrence

See Also

wigToBigWig for converting a WIG file to BigWig.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
if (.Platform$OS.type != "windows") {
  test_path <- system.file("tests", package = "rtracklayer")
  test_bw <- file.path(test_path, "test.bw")

  ## GRanges
  ## Returns ranges with non-zero scores.
  gr <- import(test_bw)
  gr 

  which <- GRanges(c("chr2", "chr2"), IRanges(c(1, 300), c(400, 1000)))
  import(test_bw, which = which)

  ## RleList
  ## Scores returned as an RleList is equivalent to the coverage.
  ## Best option when 'which' or 'selection' contain many small ranges.
  mini <- narrow(unlist(tile(which, 50)), 2)
  rle <- import(test_bw, which = mini, as = "RleList")
  rle 

  ## NumericList
  ## The 'which' is stored as metadata:
  track <- import(test_bw, which = which, as = "NumericList")
  metadata(track)

## Not run: 
  test_bw_out <- file.path(tempdir(), "test_out.bw")
  export(test, test_bw_out)

## End(Not run)

  bwf <- BigWigFile(test_bw)
  track <- import(bwf)

  seqinfo(bwf)  

  summary(bwf) # for each sequence, average all values into one
  summary(bwf, range(head(track))) # just average the first few features
  summary(bwf, size = seqlengths(bwf) / 10) # 10X reduction
  summary(bwf, type = "min") # min instead of mean
  summary(bwf, track, size = 10, as = "matrix") # each feature 10 windows
}

rtracklayer documentation built on Sept. 8, 2019, 2 a.m.