tabix_build: Build a tabix index file for fast access to...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/tabix.R

Description

Given a pre-sorted and compressed file in a compatible tab-separated-columns format, create a Tabix index file to perform fast queries on regions of data.

Usage

1
tabix_build( filename, sc, bc, ec, meta, lineskip )

Arguments

filename

Name of file to create index for

sc

Number of sequence column

bc

Number of start column

ec

Number of end column

meta

Symbol used to begin comment/meta-information lines

lineskip

Number of lines to skip from the top

Details

Tabix is a tool that has been developed to quickly retrieve data on an arbitrary chromosomal region from files that store their data in tab-separated columns, such as VCF, BED, GFF and SAM. As long as there is a column for named groups (e.g. chromosomes) and another column giving a numerical order (e.g. chromosomal position), it can be used for other data as well. As a required preprocessing step, it creates an index file for a file which has been sorted by group names (e.g. chromosome) and location as well as gzip/bgzf-compressed. After sorting, compressing and indexing, specific portions of such a file can be very efficiently retrieved, e.g. using the other tabix_XXX functions.

Value

TRUE or FALSE.

Author(s)

Ulrich Wittelsbuerger

See Also

tabix_open, tabix_setregion, tabix_read

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
##
##	Example :
##

gfffile  <- system.file("extdata", "ex.gff3", package = "WhopGenome" )
gfffile

gffbasename <- tempfile()
file.copy( from=gfffile, to=gffbasename )
gffgzfile <- paste( sep="", gffbasename, ".gz" )
gffgzfile

##
##
gffindexfile <- paste( sep="", gffgzfile, ".tbi" )
gffindexfile
stopifnot( ! file.exists( gffindexfile ) )
print( "Index file does not exist yet!" )

###
###	compress GFF file
###
bgzf_compress( gffbasename , gffgzfile )
stopifnot( file.exists( gffgzfile ) )
###
###	build index
###
tabix_build( filename = gffgzfile,
			 sc = as.integer(1),
			 bc = as.integer(2),
			 ec = as.integer(3),
			 meta = "#",
			 lineskip = as.integer(0)
			)
# [1] TRUE
stopifnot( file.exists( gffindexfile ) )
print( "Index file has been built" )
#
gffh <- tabix_open( gffgzfile )
gffh

WhopGenome documentation built on May 1, 2019, 10:12 p.m.