read_bed: Load a BED-format file

read_bedR Documentation

Load a BED-format file

Description

This function loads the input file as a data.table object. The file can be either local or remote, and can be either plain text or gzip-compressed. Furthermore, this function supports range-loading by providing a genomic range in the following syntax: "chr1:1-100".

Usage

read_bed(
  input = NULL,
  file_path = NULL,
  cmd = NULL,
  range = NULL,
  genome = NULL,
  use_gr = TRUE,
  ...
)

Arguments

file_path

Path to the data file. It can be either a local file, or a remote URL.

range

A genomic range character vector. Must follow standard genomic range notation format, e.g. chr1:1001-2000

genome

Specify the reference genome for the BED file. genome can be a valid genome name in GenomeInfoDb::Seqinfo(), e.g. GRCh37, or hs37-1kg, which is a genome shipped with this package, or any custom chromosome size files (local or remote). Here is a good resource for such files: https://github.com/igvteam/igv/tree/master/genomes/sizes.

use_gr

If TRUE, will read the data as a GenomicRanges object, otherwise a data.table object. Generally, we recommend using GenomicRanges.

...

Other arguments to be passed to data.table::fread().

compression

Indicate the compression type. If detect, this function will try to guess from file_path.

tabix_index

A character value indicating the location of the tabix index file. Can be either local or remote. If NULL, it will be derived from file_path.

download_index

Whether to download (cache) the tabix index at current directory.

sep

The separator between columns. By default, BED files are tab-delimited, and sep should be \t. However, sometimes you will encounter non-standard table files. In such cases, you need to specify the separator. If auto, read_bed will try to guess the separator. For more details, refer to data.table::fread().

Details

Note: for loading remote data files, currently this function depends on tabix.c 0.2.5, which doesn't not support HTTPS protocol. In the next step, I plan to turn to htslib, and the this function can load remote data files through HTTPS.

See Also

data.table::fread()

Examples

bedtbl <- read_bed(system.file("extdata", "example_merge.bed", package = "bedtorch"))
head(bedtbl)

# Basic usage
bedtbl <- read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
                  range = "1:3001-4000")
head(bedtbl)

# Specify the reference genome
head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
              range = "1:3001-4000",
              genome = "hs37-1kg"))

head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
              range = "1:3001-4000",
              genome = "GRCh37"))

head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
              range = "1:3001-4000",
              genome = "https://raw.githubusercontent.com/igvteam/igv/master/genomes/sizes/1kg_v37.chrom.sizes"))

# Load remote BGZIP files with tabix index specified
head(read_bed("https://git.io/JYATB", range = "22:20000001-30000001", tabix_index = "https://git.io/JYAkT"))

haizi-zh/bedtorch documentation built on July 1, 2022, 10:40 a.m.