BEDFile-class: BEDFile objects
In rtracklayer: R interface to genome annotation files and the UCSC genome browser

Description Usage Arguments Details Value BEDX+Y formats BEDFile objects Author(s) References Examples

These functions support the import and export of the UCSC BED format and its variants, including BEDGraph.

## S4 method for signature 'BEDFile,ANY,ANY'
import(con, format, text, trackLine = TRUE,
                   genome = NA, colnames = NULL,
                   which = NULL, seqinfo = NULL, extraCols = character(),
                   sep = c("\t", ""), na.strings=character(0L))
import.bed(con, ...)
import.bed15(con, ...)
import.bedGraph(con,  ...)

## S4 method for signature 'ANY,BEDFile,ANY'
export(object, con, format, ...)
## S4 method for signature 'GenomicRanges,BEDFile,ANY'
export(object, con, format,
                  append = FALSE, index = FALSE,
                  ignore.strand = FALSE, trackLine = NULL)
## S4 method for signature 'UCSCData,BEDFile,ANY'
export(object, con, format,
                   trackLine = TRUE, ...)
export.bed(object, con, ...)
export.bed15(object, con, ...)
## S4 method for signature 'GenomicRanges,BED15File,ANY'
export(object, con, format,
                  expNames = NULL, trackLine = NULL, ...)
export.bedGraph(object, con, ...)

`con`	A path, URL, connection or `BEDFile` object. For the functions ending in `.bed`, `.bedGraph` and `.bed15`, the file format is indicated by the function name. For the base `export` and `import` functions, the format must be indicated another way. If `con` is a path, URL or connection, either the file extension or the `format` argument needs to be one of “bed”, “bed15”, “bedGraph”, “bedpe”, “narrowPeak”, or “broadPeak”. Compressed files (“gz”, “bz2” and “xz”) are handled transparently.
`object`	The object to export, should be a `GRanges` or something coercible to a `GRanges`. If targeting the BEDPE format, this should be something coercible to `Pairs`. If the object has a method for `asBED` (like `GRangesList`), it is called prior to coercion. This makes it possible to export a `GRangesList` or `TxDb` in a way that preserves the hierarchical structure. For exporting multiple tracks, in the UCSC track line metaformat, pass a `GenomicRangesList`, or something coercible to one.
`format`	If not missing, should be one of “bed”, “bed15”, “bedGraph”, “bedpe”, “narrowPeak” or “broadPeak”.
`text`	If `con` is missing, a character vector to use as the input
`trackLine`	For import, an imported track line will be stored in a `TrackLine` object, as part of the returned `UCSCData`. For the UCSCData method on export, whether to output the UCSC track line stored on the object, for the other export methods, the actual TrackLine object to export.
`genome`	The identifier of a genome, or a `Seqinfo`, or `NA` if unknown. Typically, this is a UCSC identifier like “hg19”. An attempt will be made to derive the `seqinfo` on the return value using either an installed BSgenome package or UCSC, if network access is available.
`colnames`	A character vector naming the columns to parse. These should name columns in the result, not those in the BED spec, so e.g. specify “thick”, instead of “thickStart”.
`which`	A `GRanges` or other range-based object supported by `findOverlaps`. Only the intervals in the file overlapping the given ranges are returned. This is much more efficient when the file is indexed with the tabix utility.
`index`	If `TRUE`, automatically compress and index the output file with bgzf and tabix. Note that tabix indexing will sort the data by chromosome and start. Tabix supports a single track in a file.
`ignore.strand`	Whether to output the strand when not required (by the existence of later fields).
`seqinfo`	If not `NULL`, the `Seqinfo` object to set on the result. Ignored if `genome` is a `Seqinfo` object. If the `genome` argument is not `NA`, it must agree with `genome(seqinfo)`.
`extraCols`	A character vector in the same form as `colClasses` from `read.table`. It should indicate the name and class of each extra/special column to read from the BED file. As BED does not encode column names, these are assumed to be the last columns in the file. This enables parsing of the various BEDX+Y formats.
`sep`	A character vector with a single character indicating the field separator, like `read.table`. This defaults to `"\t"`, as BEDtools requires, but BED files are also allowed to be whitespace separated (`""`) according to the UCSC spec.
`na.strings`	Character vector with strings, appended to the standard `"."`, that represent an `NA` value.
`append`	If `TRUE`, and `con` points to a file path, the data is appended to the file. Obviously, if `con` is a connection, the data is always appended.
`expNames`	character vector naming columns in `mcols(object)` to export as data columns in the BED15 file. These correspond to the sample names in the experiment. If `NULL` (the default), there is an attempt to extract these from `trackLine`. If the attempt fails, no scores are exported.
`...`	Arguments to pass down to methods to other methods. For import, the flow eventually reaches the `BEDFile` method on `import`. When `trackLine` is `TRUE` or the target format is BED15, the arguments are passed through `export.ucsc`, so track line parameters are supported.

The BED format is a tab-separated table of intervals, with annotations like name, score and even sub-intervals for representing alignments and gene models. Official (UCSC) child formats currently include BED15 (adding a number matrix for e.g. expression data across multiple samples) and BEDGraph (a compressed means of storing a single score variable, e.g. coverage; overlapping features are not allowed). Many tools and organizations have extended the BED format with additional columns for particular use cases. The advantage of BED is its balance between simplicity and expressiveness. It is also relatively scalable, because only the first three columns (chrom, start and end) are required. Thus, BED is best suited for representing simple features. For specialized cases, one is usually better off with another format. For example, genome-scale vectors belong in BigWig, alignments from high-throughput sequencing belong in BAM, and gene models are more richly expressed in GFF.

The following is the mapping of BED elements to a GRanges object. NA values are allowed only where indicated. These appear as a “.” in the file. Only the first three columns (chrom, start and strand) are required. The other columns can only be included if all previous columns (to the left) are included. Upon export, default values are used to automatically pad the table, if necessary.

chrom, start, end: the ranges component.
name: character vector (NA's allowed) in the name column; defaults to NA on export.
score: numeric vector in the score column, accessible via the score accessor. Defaults to 0 on export. This is the only column present in BEDGraph (besides chrom, start and end), and it is required.
strand: strand factor (NA's allowed) in the strand column, accessible via the strand accessor; defaults to NA on export.
thickStart, thickEnd: IntegerRanges object in a column named thick; defaults to the ranges of the feature on export.
itemRgb: an integer matrix of color codes, as returned by col2rgb, or any valid input to col2rgb, in the itemRgb column; default is NA on export, which translates to black.
blockSizes, blockStarts, blockCounts: IntegerRangesList object in a column named blocks; defaults to empty upon BED15 export.

For BED15 files, there should be a column of scores in mcols(object) for each sample in the experiment. The columns are named according to the expNames (found in the file, or passed as an argument during export). NA scores are stored as “-10000” in the file.

For a “bedpe” file, a Pairs object combining two GRanges. The name and score are carried over to the metadata columns.

Otherwise, a GRanges with the metadata columns described in the details.

To import one of the multitude of BEDX+Y formats, such as those used to distribute ENCODE data through UCSC (narrowPeaks, etc), specify the extraCols argument to indicate the expected names and classes of the special columns. We assume that the last length(extraCols) columns are special, and that the preceding columns adhere to the BED format. “narrowPeak” and “broadPeak” types are handled explicitly by specifying these types as the format argument, rather than by using extraCols.

The BEDFile class extends RTLFile and is a formal represention of a resource in the BED format. To cast a path, URL or connection to a BEDFile, pass it to the BEDFile constructor. Classes and constructors also exist for the subclasses BED15File, BEDGraphFile and BEDPEFile.

Michael Lawrence

http://genome.ucsc.edu/goldenPath/help/customTrack.html http://bedtools.readthedocs.org/en/latest/content/general-usage.html

  test_path <- system.file("tests", package = "rtracklayer")
  test_bed <- file.path(test_path, "test.bed")

  test <- import(test_bed)
  test

  test_bed_file <- BEDFile(test_bed)
  import(test_bed_file)

  test_bed_con <- file(test_bed)
  import(test_bed_con, format = "bed")

  import(test_bed, trackLine = FALSE)
  import(test_bed, genome = "hg19")
  import(test_bed, colnames = c("name", "strand", "thick"))

  which <- GRanges("chr7:1-127473000")
  import(test_bed, which = which)

  bed15_file <- file.path(test_path, "test.bed15")
  bed15 <- import(bed15_file)

## Not run: 
  test_bed_out <- file.path(tempdir(), "test.bed")
  export(test, test_bed_out)

  test_bed_out_file <- BEDFile(test_bed_out)
  export(test, test_bed_out_file)

  export(test, test_bed_out, name = "Alternative name")

  test_bed_gz <- paste(test_bed_out, ".gz", sep = "")
  export(test, test_bed_gz)

  export(test, test_bed_out, index = TRUE)
  export(test, test_bed_out, index = TRUE, trackLine = FALSE)

  bed_text <- export(test, format = "bed")
  test <- import(format = "bed", text = bed_text)

  test_bed15_out <- file.path(tempdir(), "test.bed15")
  export(bed15, test_bed15_out) # UCSCData knows the expNames
  export(as(bed15, "GRanges"), test_bed15_out, # have to specify expNames
         expNames=paste0("breast_", c("A", "B", "C")))

## End(Not run)