Description Usage Arguments Value Error Author(s) References Examples
Scan compressed, sorted, tabix-indexed, tab-delimited files.
1 2 3 4 5 |
file |
The character() file name(s) of the tabix file be
processed, or more flexibly an instance of class
|
param |
A instance of |
... |
Additional arguments, currently ignored. |
scanTabix
returns a list, with one element per region. Each element
of the list is a character vector representing records in the region.
scanTabix
signals errors using signalCondition
. The
following errors are signaled:
scanTabix_param
yieldSize(file)
must be NA when
more than one range is specified.
scanTabix_io
A read error occured while inputing the
tabix file. This might be because the file is corrupt, or of
incorrect format (e.g., when path
points to a plain text
file but index
is present, implying that path
should
be a bgzip
ed file. The error message may include an error
code representing the logical OR of these cryptic signals: 1,
BGZF_ERR_ZLIB; 2, BGZF_ERR_HEADER; 4, BGZF_ERR_IO; 8,
BGZF_ERR_MISUSE.
Martin Morgan <mtmorgan@fhcrc.org>.
http://samtools.sourceforge.net/tabix.shtml
1 |
Loading required package: GenomeInfoDb
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colMeans, colSums, colnames, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
TabxFl> fl <- system.file("extdata", "example.gtf.gz", package="Rsamtools",
TabxFl+ mustWork=TRUE)
TabxFl> tbx <- TabixFile(fl)
TabxFl> param <- GRanges(c("chr1", "chr2"), IRanges(c(1, 1), width=100000))
TabxFl> countTabix(tbx)
$<NA>
[1] 237
TabxFl> countTabix(tbx, param=param)
$`chr1:1-100000`
[1] 157
$`chr2:1-100000`
[1] 15
TabxFl> res <- scanTabix(tbx, param=param)
TabxFl> sapply(res, length)
chr1:1-100000 chr2:1-100000
157 15
TabxFl> res[["chr1:1-100000"]][1:2]
[1] "chr1\tENSEMBL\tUTR\t1737\t2090\t.\t+\t.\tgene_id \"ENSG00000223972\"; transcript_id \"ENST00000456328\"; gene_type \"protein_coding\"; gene_status \"KNOWN\"; gene_name \"RP11-34P13.1\"; transcript_type \"protein_coding\"; transcript_status \"KNOWN\"; transcript_name \"RP11-34P13.1-201\"; level 3; havana_gene \"OTTHUMG00000000961\";"
[2] "chr1\tENSEMBL\texon\t1737\t2090\t.\t+\t.\tgene_id \"ENSG00000223972\"; transcript_id \"ENST00000456328\"; gene_type \"protein_coding\"; gene_status \"KNOWN\"; gene_name \"RP11-34P13.1\"; transcript_type \"protein_coding\"; transcript_status \"KNOWN\"; transcript_name \"RP11-34P13.1-201\"; level 3; havana_gene \"OTTHUMG00000000961\";"
TabxFl> ## parse to list of data.frame's
TabxFl> dff <- Map(function(elt) {
TabxFl+ read.csv(textConnection(elt), sep="\t", header=FALSE)
TabxFl+ }, res)
TabxFl> dff[["chr1:1-100000"]][1:5,1:8]
V1 V2 V3 V4 V5 V6 V7 V8
1 chr1 ENSEMBL UTR 1737 2090 . + .
2 chr1 ENSEMBL exon 1737 2090 . + .
3 chr1 ENSEMBL transcript 1737 4275 . + .
4 chr1 HAVANA gene 1737 4275 . + .
5 chr1 HAVANA exon 1873 1920 . + .
TabxFl> ## parse 100 records at a time
TabxFl> length(scanTabix(tbx)[[1]]) # total number of records
[1] 237
TabxFl> tbx <- open(TabixFile(fl, yieldSize=100))
TabxFl> while(length(res <- scanTabix(tbx)[[1]]))
TabxFl+ cat("records read:", length(res), "\n")
records read: 100
records read: 100
records read: 37
TabxFl> close(tbx)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.