GeneRegionTrack-class: GeneRegionTrack class and methods
In ivanek/Gviz: Plotting data and annotation information along genomic coordinates

GeneRegionTrack-class

R Documentation

GeneRegionTrack class and methods

Description

A class to hold gene model data for a genomic region.

Usage

## S4 method for signature 'GeneRegionTrack'
initialize(.Object, start, end, ...)

## S4 method for signature 'ReferenceGeneRegionTrack'
initialize(
  .Object,
  stream,
  reference,
  mapping = list(),
  args = list(),
  defaults = list(),
  ...
)

GeneRegionTrack(
  range = NULL,
  rstarts = NULL,
  rends = NULL,
  rwidths = NULL,
  strand,
  feature,
  exon,
  transcript,
  gene,
  symbol,
  chromosome,
  genome,
  stacking = "squish",
  name = "GeneRegionTrack",
  start = NULL,
  end = NULL,
  importFunction,
  stream = FALSE,
  ...
)

## S4 method for signature 'GeneRegionTrack'
gene(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
gene(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
symbol(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
symbol(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
transcript(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
transcript(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
exon(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
exon(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
group(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
group(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
identifier(
  GdObject,
  type = .dpOrDefault(GdObject, "transcriptAnnotation", "symbol")
)

## S4 replacement method for signature 'GeneRegionTrack,character'
identifier(GdObject) <- value

## S4 method for signature 'ReferenceGeneRegionTrack'
subset(x, ...)

## S4 method for signature 'GeneRegionTrack'
drawGD(GdObject, ...)

## S4 method for signature 'GeneRegionTrack'
show(object)

## S4 method for signature 'ReferenceGeneRegionTrack'
show(object)

Arguments

`.Object`	.Object
`start`, `end`	An integer scalar with the genomic start or end coordinate for the gene model range. If those are missing, the default value will automatically be the smallest (or largest) value, respectively in `rstarts` and `rends` for the currently active chromosome. When building a `GeneRegionTrack` from a `TxDb` object, these arguments can be used to subset the desired annotation data by genomic coordinates. Please note this in that case the `chromosome` parameter must also be set.
`...`	Additional items which will all be interpreted as further display parameters. See `settings` and the "Display Parameters" section below for details.
`stream`	A logical flag indicating that the user-provided import function can deal with indexed files and knows how to process the additional `selection` argument when accessing the data on disk. This causes the constructor to return a `ReferenceGeneRegionTrack` object which will grab the necessary data on the fly during each plotting operation.
`reference`	reference file
`mapping`	mapping
`args`	args
`defaults`	`logical`
`range`	An optional meta argument to handle the different input types. If the `range` argument is missing, all the relevant information to create the object has to be provided as individual function arguments (see below). The different input options for `range` are: A `TxDb` object: all the necessary gene model information including exon locations, transcript groupings and associated gene ids are contained in `TxDb` objects, and the coercion between the two is almost completely automated. If desired, the data to be fetched from the `TxDb` object can be restricted using the constructor's `chromosome`, `start` and `end` arguments. See below for details. A direct coercion method `as(obj, "GeneRegionTrack")` is also available. A nice added benefit of this input option is that the UTR and coding region information that is part of the original `TxDb` object is retained in the `GeneRegionTrack`. A `GRanges` object: the genomic ranges for the `GeneRegion` track as well as the optional additional metadata columns `feature`, `transcript`, `gene`, `exon` and `symbol` (see description of the individual function parameters below for details). Calling the constructor on a `GRanges` object without further arguments, e.g. `GeneRegionTrack(range=obj)` is equivalent to calling the coerce method `as(obj, "GeneRegionTrack")`. A `GRangesList` object: this is very similar to the previous case, except that the grouping information that is part of the list structure is preserved in the `GeneRegionTrack`. I.e., all the elements within one list item receive the same group id. For consistancy, there is also a coercion method from `GRangesLists` `as(obj, "GeneRegionTrack")`. Please note that unless the necessary information about gene ids, symbols, etc. is present in the individual `GRanges` meta data slots, the object will not be particularly useful, because all the identifiers will be set to a common default value. An `IRanges` object: almost identical to the `GRanges` case, except that the chromosome and strand information as well as all additional data has to be provided in the separate `chromosome`, `strand`, `feature`, `transcript`, `symbol`, `exon` or `gene` arguments, because it can not be directly encoded in an `IRanges` object. Note that only the former two are mandatory (if not provided explicitely the more or less reasonable default values `chromosome=NA` and `strand=*` are used, but not providing information about the gene-to-transcript relationship or the human-readble symbols renders a lot of the class' functionality useles. A `data.frame` object: the `data.frame` needs to contain at least the two mandatory columns `start` and `end` with the range coordinates. It may also contain a `chromosome` and a `strand` column with the chromosome and strand information for each range. If missing, this information will be drawn from the constructor's `chromosome` or `strand` arguments. In addition, the `feature`, `exon`, `transcript`, `gene` and `symbol` data can be provided as columns in the `data.frame`. The above comments about potential default values also apply here. A `character` scalar: in this case the value of the `range` argument is considered to be a file path to an annotation file on disk. A range of file types are supported by the `Gviz` package as identified by the file extension. See the `importFunction` documentation below for further details.
`rstarts`	An integer vector of the start coordinates for the actual gene model items, i.e., for the individual exons. The relationship between exons is handled via the `gene` and `transcript` factors. Alternatively, this can be a vector of comma-separated lists of integer coordinates, one vector item for each transcript, and each comma-separated element being the start location of a single exon within that transcript. Those lists will be exploded upon object instantiation and all other annotation arguments will be recycled accordingly to regenerate the exon/transcript/gene relationship structure. This implies the approriate number of items in all annotation and coordinates arguments.
`rends`	An integer vector of the end coordinates for the actual gene model items. Both `rstarts` and `rends` have to be of equal length.
`rwidths`	An integer vector of widths for the actual gene model items. This can be used instead of either `rstarts` or `rends` to specify the range coordinates.
`strand`	Character vector, the strand information for the individual track exons. It may be provided in the form `+` for the Watson strand, `-` for the Crick strand or `*` for either one of the two. Please note that all items within a single gene or transcript model need to be on the same strand, and erroneous entries will result in casting of an error.
`feature`	Factor (or other vector that can be coerced into one), giving the feature types for the individual track exons. When plotting the track to the device, if a display parameter with the same name as the value of `feature` is set, this will be used as the track item's fill color. Additionally, the feature type defines whether an element in the `GeneRegionTrack` is considered to be coding or non-coding. The details section as well as the section about the `thinBoxFeature` display parameter further below has more information on this. See also `grouping` for details.
`exon`	Character vector of exon identifiers. It's values will be used as the identifier tag when plotting to the device if the display parameter `showExonId=TRUE`.
`transcript`	Factor (or other vector that can be coerced into one), giving the transcript memberships for the individual track exons. All items with the same transcript identifier will be visually connected when plotting to the device. See `grouping` for details. Will be used as labels when `showId=TRUE`, and `geneSymbol=FALSE`.
`gene`	Factor (or other vector that can be coerced into one), giving the gene memberships for the individual track exons.
`symbol`	A factor with human-readable gene name aliases which will be used as labels when `showId=TRUE`, and `geneSymbol=TRUE`.
`chromosome`	The chromosome on which the track's genomic ranges are defined. A valid UCSC chromosome identifier if `options(ucscChromosomeNames=TRUE)`. Please note that in this case only syntactic checking takes place, i.e., the argument value needs to be an integer, numeric character or a character of the form `chrx`, where `x` may be any possible string. The user has to make sure that the respective chromosome is indeed defined for the the track's genome. If not provided here, the constructor will try to build the chromosome information based on the available inputs, and as a last resort will fall back to the value `chrNA`. Please note that by definition all objects in the `Gviz` package can only have a single active chromosome at a time (although internally the information for more than one chromosome may be present), and the user has to call the `chromosome<-` replacement method in order to change to a different active chromosome. When creating a `GeneRegionTrack` from a `TxDb` object, the value of this parameter can be used to subset the data to fetch only transcripts from a single chromosome.
`genome`	The genome on which the track's ranges are defined. Usually this is a valid UCSC genome identifier, however this is not being formally checked at this point. If not provided here the constructor will try to extract this information from the provided inputs, and eventually will fall back to the default value of `NA`.
`stacking`	The stacking type for overlapping items of the track. One in `c(hide, dense, squish, pack,full)`. Currently, only hide (don't show the track items, squish (make best use of the available space) and dense (no stacking at all) are implemented.
`name`	Character scalar of the track's name used in the title panel when plotting.
`importFunction`	A user-defined function to be used to import the data from a file. This only applies when the `range` argument is a character string with the path to the input data file. The function needs to accept an argument `x` containing the file path and has to return a proper `GRanges` object with all the necessary metadata columns set. A set of default import functions is already implemented in the package for a number of different file types, and one of these defaults will be picked automatically based on the extension of the input file name. If the extension can not be mapped to any of the existing import function, an error is raised asking for a user-defined import function via this argument. Currently the following file types can be imported with the default functions: `gff`, `gff1`, `gff2`, `gff3`, `gtf`.
`GdObject`	Object of `GdObject-class`.
`value`	Value to be set.
`type`	type
`x`	A valid track object class name, or the object itself, in which case the class is derived directly from it.
`object`	object

Details

A track containing all gene models in a particular region. The data are usually fetched dynamially from an online data store, but it is also possible to manully construct objects from local data. Connections to particular online data sources should be implemented as sub-classes, and GeneRegionTrack is just the commone denominator that is being used for plotting later on. There are several levels of data associated to a GeneRegionTrack:

exon level:: identifiers are stored in the exon column of the GRanges object in the range slot. Data may be extracted using the exon method.
transcript level:: identifiers are stored in the transcript column of the GRanges object. Data may be extracted using the transcript method.
gene level:: identifiers are stored in the gene column of the GRanges object, more human-readable versions in the symbol column. Data may be extracted using the gene or the symbol methods.
transcript-type level:: information is stored in the feature column of the GRanges object. If a display parameter of the same name is specified, the software will use its value for the coloring.

GeneRegionTrack objects also know about coding regions and non-coding regions (e.g., UTRs) in a transcript, and will indicate those by using different shapes (wide boxes for all coding regions, thinner boxes for non-coding regions). This is archived by setting the feature values of the object for non-coding elements to one of the options that are provided in the thinBoxFeature display parameters. All other elements are considered to be coding elements.

Value

The return value of the constructor function is a new object of class GeneRegionTrack.

Functions

initialize(GeneRegionTrack): Initialize.
ReferenceGeneRegionTrack-class: The file-based version of the GeneRegionTrack-class.
initialize(ReferenceGeneRegionTrack): Initialize.
GeneRegionTrack(): Constructor function for GeneRegionTrack-class.
gene(GeneRegionTrack): Extract the gene identifiers for all gene models.
gene(GdObject = GeneRegionTrack) <- value: Replace the gene identifiers for all gene models. The replacement value must be a character of appropriate length or another vector that can be coerced into such.
symbol(GeneRegionTrack): Extract the human-readble gene symbol for all gene models.
symbol(GdObject = GeneRegionTrack) <- value: Replace the human-readable gene symbol for all gene models. The replacement value must be a character of appropriate length or another vector that can be coerced into such.
transcript(GeneRegionTrack): Extract the transcript identifiers for all transcripts in the gene models.
transcript(GdObject = GeneRegionTrack) <- value: Replace the transcript identifiers for all transcripts in the gene model. The replacement value must be a character of appropriate length or another vector that can be coerced into such.
exon(GeneRegionTrack): Extract the exon identifiers for all exons in the gene models.
exon(GdObject = GeneRegionTrack) <- value: replace the exon identifiers for all exons in the gene model. The replacement value must be a character of appropriate length or another vector that can be coerced into such.
group(GeneRegionTrack): extract the group membership for all track items.
group(GdObject = GeneRegionTrack) <- value: replace the grouping information for track items. The replacement value must be a factor of appropriate length or another vector that can be coerced into such.
identifier(GeneRegionTrack): return track item identifiers. Depending on the setting of the optional argument lowest, these are either the group identifiers or the individual item identifiers. export
identifier(GdObject = GeneRegionTrack) <- value: Set the track item identifiers. The replacement value has to be a character vector of appropriate length. This always replaces the group-level identifiers, so essentially it is similar to ⁠groups<-⁠.
subset(ReferenceGeneRegionTrack): Subset a GeneRegionTrack by coordinates and sort if necessary.
drawGD(GeneRegionTrack): plot the object to a graphics device. The return value of this method is the input object, potentially updated during the plotting operation. Internally, there are two modes in which the method can be called. Either in 'prepare' mode, in which case no plotting is done but the object is preprocessed based on the available space, or in 'plotting' mode, in which case the actual graphical output is created. Since subsetting of the object can be potentially costly, this can be switched off in case subsetting has already been performed before or is not necessary.
show(GeneRegionTrack): Show method.
show(ReferenceGeneRegionTrack): Show method.

Objects from the class

Objects can be created using the constructor function GeneRegionTrack.

Author(s)

Florian Hahne, Steve Lianoglou

Examples



## The empty object
GeneRegionTrack()

## Load some sample data
data(cyp2b10)

## Construct the object
grTrack <- GeneRegionTrack(
    start = 26682683, end = 26711643,
    rstart = cyp2b10$start, rends = cyp2b10$end, chromosome = 7, genome = "mm9",
    transcript = cyp2b10$transcript, gene = cyp2b10$gene, symbol = cyp2b10$symbol,
    feature = cyp2b10$feature, exon = cyp2b10$exon,
    name = "Cyp2b10", strand = cyp2b10$strand
)

## Directly from the data.frame
grTrack <- GeneRegionTrack(cyp2b10)

## From a TxDb object
if (require(GenomicFeatures)) {
    samplefile <- system.file("extdata",
                              "hg19_knownGene_sample.sqlite",
                              package = "GenomicFeatures")
    txdb <- loadDb(samplefile)
    GeneRegionTrack(txdb)
    GeneRegionTrack(txdb, chromosome = "chr6", start = 35000000, end = 40000000)
}


## Plotting
plotTracks(grTrack)

## Track names
names(grTrack)
names(grTrack) <- "foo"
plotTracks(grTrack)

## Subsetting and splitting
subTrack <- subset(grTrack, from = 26700000, to = 26705000)
length(subTrack)
subTrack <- grTrack[transcript(grTrack) == "ENSMUST00000144140"]
split(grTrack, transcript(grTrack))

## Accessors
start(grTrack)
end(grTrack)
width(grTrack)
position(grTrack)
width(subTrack) <- width(subTrack) + 100

strand(grTrack)
strand(subTrack) <- "-"

chromosome(grTrack)
chromosome(subTrack) <- "chrX"

genome(grTrack)
genome(subTrack) <- "hg19"

range(grTrack)
ranges(grTrack)

## Annotation
identifier(grTrack)
identifier(grTrack, "lowest")
identifier(subTrack) <- "bar"

feature(grTrack)
feature(subTrack) <- "foo"

exon(grTrack)
exon(subTrack) <- letters[1:2]

gene(grTrack)
gene(subTrack) <- "bar"

symbol(grTrack)
symbol(subTrack) <- "foo"

transcript(grTrack)
transcript(subTrack) <- c("foo", "bar")
chromosome(subTrack) <- "chr7"
plotTracks(subTrack)

values(grTrack)

## Grouping
group(grTrack)
group(subTrack) <- "Group 1"
transcript(subTrack)
plotTracks(subTrack)

## Collapsing transcripts
plotTracks(grTrack,
    collapseTranscripts = TRUE, showId = TRUE,
    extend.left = 10000, shape = "arrow"
)

## Stacking
stacking(grTrack)
stacking(grTrack) <- "dense"
plotTracks(grTrack)

## coercion
as(grTrack, "data.frame")
as(grTrack, "UCSCData")

## HTML image map
coords(grTrack)
tags(grTrack)
grTrack <- plotTracks(grTrack)$foo
coords(grTrack)
tags(grTrack)

ivanek/Gviz documentation built on Jan. 24, 2025, 3:34 p.m.