Manipulate tabix indexed tab-delimited files.

Share:

Description

Use TabixFile() to create a reference to a Tabix file (and its index). Once opened, the reference remains open across calls to methods, avoiding costly index re-loading.

TabixFileList() provides a convenient way of managing a list of TabixFile instances.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Constructors

TabixFile(file, index = paste(file, "tbi", sep="."), ...,
          yieldSize=NA_integer_)
TabixFileList(..., yieldSize=NA_integer_)

## Opening / closing

## S3 method for class 'TabixFile'
open(con, ...)
## S3 method for class 'TabixFile'
close(con, ...)

## accessors; also path(), index(), yieldSize()

## S4 method for signature 'TabixFile'
isOpen(con, rw="")

## actions

## S4 method for signature 'TabixFile'
seqnamesTabix(file, ...)
## S4 method for signature 'TabixFile'
headerTabix(file, ...)
## S4 method for signature 'TabixFile,GRanges'
scanTabix(file, ..., param)
## S4 method for signature 'TabixFile,RangesList'
scanTabix(file, ..., param)
## S4 method for signature 'TabixFile,missing'
scanTabix(file, ..., param)
## S4 method for signature 'character,ANY'
scanTabix(file, ..., param)
## S4 method for signature 'character,missing'
scanTabix(file, ..., param)

countTabix(file, ...)

Arguments

con

An instance of TabixFile.

file

For TabixFile(), A character(1) vector to the tabix file path; can be remote (http://, ftp://). For countTabix, a character(1) or TabixFile instance. For others, a TabixFile instance.

index

A character(1) vector of the tabix file index.

yieldSize

Number of records to yield each time the file is read from using scanTabix. Only valid when param is unspecified. yieldSize does not alter existing yield sizes, include NA, when creating a TabixFileList from TabixFile instances.

param

An instance of GRanges or RangesList, used to select which records to scan.

...

Additional arguments. For TabixFileList, this can either be a single character vector of paths to tabix files, or several instances of TabixFile objects.

rw

character() indicating mode of file; not used for TabixFile.

Objects from the Class

Objects are created by calls of the form TabixFile().

Fields

The TabixFile class inherits fields from the RsamtoolsFile class.

Functions and methods

TabixFileList inherits methods from RsamtoolsFileList and SimpleList.

Opening / closing:

open.TabixFile

Opens the (local or remote) path and index. Returns a TabixFile instance. yieldSize determines the number of records parsed during each call to scanTabix; NA indicates that all records are to be parsed.

close.TabixFile

Closes the TabixFile con; returning (invisibly) the updated TabixFile. The instance may be re-opened with open.TabixFile.

Accessors:

path

Returns a character(1) vector of the tabix path name.

index

Returns a character(1) vector of tabix index name.

yieldSize, yieldSize<-

Return or set an integer(1) vector indicating yield size.

Methods:

seqnamesTabix

Visit the path in path(file), returning the sequence names present in the file.

headerTabix

Visit the path in path(file), returning the sequence names, column indicies used to sort the file, the number of lines skipped while indexing, the comment character used while indexing, and the header (preceeded by comment character, at start of file) lines.

countTabix

Return the number of records in each range of param, or the count of all records in the file (when param is missing).

scanTabix

For signature(file="TabixFile"), Visit the path in path(file), returning the result of scanTabix applied to the specified path. For signature(file="character"), call the corresponding method after coercing file to TabixFile.

indexTabix

This method operates on file paths, rather than TabixFile objects, to index tab-separated files. See indexTabix.

show

Compactly display the object.

Author(s)

Martin Morgan

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
fl <- system.file("extdata", "example.gtf.gz", package="Rsamtools",
                  mustWork=TRUE)
tbx <- TabixFile(fl)

param <- GRanges(c("chr1", "chr2"), IRanges(c(1, 1), width=100000))
countTabix(tbx)
countTabix(tbx, param=param)
res <- scanTabix(tbx, param=param)
sapply(res, length)
res[["chr1:1-100000"]][1:2]

## parse to list of data.frame's
dff <- Map(function(elt) {
    read.csv(textConnection(elt), sep="\t", header=FALSE)
}, res)
dff[["chr1:1-100000"]][1:5,1:8]

## parse 100 records at a time
length(scanTabix(tbx)[[1]]) # total number of records
tbx <- open(TabixFile(fl, yieldSize=100))
while(length(res <- scanTabix(tbx)[[1]]))
   cat("records read:", length(res), "\n")
close(tbx)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.