TwoBitFile-class: 2bit Files

Description Usage Arguments Value TwoBitFile objects Note Author(s) See Also Examples

Description

These functions support the import and export of the UCSC 2bit compressed sequence format. The main advantage is speed of subsequence retrieval, as it only loads the sequence in the requested intervals. Compared to the FA format supported by Rsamtools, 2bit offers the additional feature of masking and also has better support in Java (and thus most genome browsers). The supporting TwoBitFile class is a reference to a TwoBit file.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## S4 method for signature 'TwoBitFile,ANY,ANY'
import(con, format, text,
           which = as(seqinfo(con), "GenomicRanges"), ...)
## S4 method for signature 'TwoBitFile'
getSeq(x, which = as(seqinfo(x), "GenomicRanges"))
import.2bit(con, ...)

## S4 method for signature 'ANY,TwoBitFile,ANY'
export(object, con, format, ...)
## S4 method for signature 'DNAStringSet,TwoBitFile,ANY'
export(object, con, format)
## S4 method for signature 'DNAStringSet,character,ANY'
export(object, con, format, ...)
export.2bit(object, con, ...)

Arguments

con

A path, URL or TwoBitFile object. Connections are not supported. For the functions ending in .2bit, the file format is indicated by the function name. For the export and import methods, the format must be indicated another way. If con is a path, or URL, either the file extension or the format argument needs to be “twoBit” or “2bit”.

object,x

The object to export, either a DNAStringSet or something coercible to a DNAStringSet, like a character vector.

format

If not missing, should be “twoBit” or “2bit” (case insensitive).

text

Not supported.

which

A range data structure coercible to IntegerRangesList, like a GRanges, or a TwoBitFile. Only the intervals in the file overlapping the given ranges are returned. By default, the value is the TwoBitFile itself. Its Seqinfo object is extracted and coerced to a IntegerRangesList that represents the entirety of the file.

...

Arguments to pass down to methods to other methods. For import, the flow eventually reaches the TwoBitFile method on import. For export, the TwoBitFile methods on export are the sink.

Value

For import, a DNAStringSet.

TwoBitFile objects

A TwoBitFile object, an extension of RTLFile is a reference to a TwoBit file. To cast a path, URL or connection to a TwoBitFile, pass it to the TwoBitFile constructor.

A TwoBit file embeds the sequence information, which can be retrieved with the following:

seqinfo(x): Gets the Seqinfo object indicating the lengths of the sequences for the intervals in the file. No circularity or genome information is available.

Note

The 2bit format only suports A, C, G, T and N (via an internal mask). To export sequences with additional IUPAC ambiguity codes, first pass the object through replaceAmbiguities from the Biostrings package.

Author(s)

Michael Lawrence

See Also

export-methods in the BSgenome package for exporting a BSgenome object as a twoBit file.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
  test_path <- system.file("tests", package = "rtracklayer")
  test_2bit <- file.path(test_path, "test.2bit")

  test <- import(test_2bit)
  test

  test_2bit_file <- TwoBitFile(test_2bit)
  import(test_2bit_file) # the whole file
  
  which_range <- IRanges(c(10, 40), c(30, 42))
  which <- GRanges(names(test), which_range)
  import(test_2bit, which = which)

  seqinfo(test_2bit_file)

## Not run: 
  test_2bit_out <- file.path(tempdir(), "test_out.2bit")
  export(test, test_2bit_out)

  ## just a character vector
  test_char <- as.character(test)
  export(test_char, test_2bit_out)

## End(Not run)

Example output

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
  A DNAStringSet instance of length 1
    width seq                                               names               
[1]   100 TGATGGAAGAATTATTTGAAAGC...ATAGTCCAGAGACTACAACTTCA gi|157704452|ref|...
  A DNAStringSet instance of length 1
    width seq                                               names               
[1]   100 TGATGGAAGAATTATTTGAAAGC...ATAGTCCAGAGACTACAACTTCA gi|157704452|ref|...
  A DNAStringSet instance of length 2
    width seq
[1]    21 AATTATTTGAAAGCCATATAG
[2]     3 ACT
Seqinfo object with 1 sequence from an unspecified genome:
  seqnames                      seqlengths isCircular genome
  gi|157704452|ref|AC_000143.1|        100         NA   <NA>

rtracklayer documentation built on Nov. 8, 2020, 6:50 p.m.