GeneRegionTrack-class: GeneRegionTrack class and methods

GeneRegionTrack-classR Documentation

GeneRegionTrack class and methods

Description

A class to hold gene model data for a genomic region.

Usage

## S4 method for signature 'GeneRegionTrack'
initialize(.Object, start, end, ...)

## S4 method for signature 'ReferenceGeneRegionTrack'
initialize(
  .Object,
  stream,
  reference,
  mapping = list(),
  args = list(),
  defaults = list(),
  ...
)

GeneRegionTrack(
  range = NULL,
  rstarts = NULL,
  rends = NULL,
  rwidths = NULL,
  strand,
  feature,
  exon,
  transcript,
  gene,
  symbol,
  chromosome,
  genome,
  stacking = "squish",
  name = "GeneRegionTrack",
  start = NULL,
  end = NULL,
  importFunction,
  stream = FALSE,
  ...
)

## S4 method for signature 'GeneRegionTrack'
gene(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
gene(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
symbol(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
symbol(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
transcript(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
transcript(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
exon(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
exon(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
group(GdObject)

## S4 replacement method for signature 'GeneRegionTrack,character'
group(GdObject) <- value

## S4 method for signature 'GeneRegionTrack'
identifier(
  GdObject,
  type = .dpOrDefault(GdObject, "transcriptAnnotation", "symbol")
)

## S4 replacement method for signature 'GeneRegionTrack,character'
identifier(GdObject) <- value

## S4 method for signature 'ReferenceGeneRegionTrack'
subset(x, ...)

## S4 method for signature 'GeneRegionTrack'
drawGD(GdObject, ...)

## S4 method for signature 'GeneRegionTrack'
show(object)

## S4 method for signature 'ReferenceGeneRegionTrack'
show(object)

Arguments

.Object

.Object

start, end

An integer scalar with the genomic start or end coordinate for the gene model range. If those are missing, the default value will automatically be the smallest (or largest) value, respectively in rstarts and rends for the currently active chromosome. When building a GeneRegionTrack from a TxDb object, these arguments can be used to subset the desired annotation data by genomic coordinates. Please note this in that case the chromosome parameter must also be set.

...

Additional items which will all be interpreted as further display parameters. See settings and the "Display Parameters" section below for details.

stream

A logical flag indicating that the user-provided import function can deal with indexed files and knows how to process the additional selection argument when accessing the data on disk. This causes the constructor to return a ReferenceGeneRegionTrack object which will grab the necessary data on the fly during each plotting operation.

reference

reference file

mapping

mapping

args

args

defaults

logical

range

An optional meta argument to handle the different input types. If the range argument is missing, all the relevant information to create the object has to be provided as individual function arguments (see below).

The different input options for range are:

A TxDb object:

all the necessary gene model information including exon locations, transcript groupings and associated gene ids are contained in TxDb objects, and the coercion between the two is almost completely automated. If desired, the data to be fetched from the TxDb object can be restricted using the constructor's chromosome, start and end arguments. See below for details. A direct coercion method as(obj, "GeneRegionTrack") is also available. A nice added benefit of this input option is that the UTR and coding region information that is part of the original TxDb object is retained in the GeneRegionTrack.

A GRanges object:

the genomic ranges for the GeneRegion track as well as the optional additional metadata columns feature, transcript, gene, exon and symbol (see description of the individual function parameters below for details). Calling the constructor on a GRanges object without further arguments, e.g. GeneRegionTrack(range=obj) is equivalent to calling the coerce method as(obj, "GeneRegionTrack").

A GRangesList object:

this is very similar to the previous case, except that the grouping information that is part of the list structure is preserved in the GeneRegionTrack. I.e., all the elements within one list item receive the same group id. For consistancy, there is also a coercion method from GRangesLists as(obj, "GeneRegionTrack"). Please note that unless the necessary information about gene ids, symbols, etc. is present in the individual GRanges meta data slots, the object will not be particularly useful, because all the identifiers will be set to a common default value.

An IRanges object:

almost identical to the GRanges case, except that the chromosome and strand information as well as all additional data has to be provided in the separate chromosome, strand, feature, transcript, symbol, exon or gene arguments, because it can not be directly encoded in an IRanges object. Note that only the former two are mandatory (if not provided explicitely the more or less reasonable default values chromosome=NA and strand=* are used, but not providing information about the gene-to-transcript relationship or the human-readble symbols renders a lot of the class' functionality useles.

A data.frame object:

the data.frame needs to contain at least the two mandatory columns start and end with the range coordinates. It may also contain a chromosome and a strand column with the chromosome and strand information for each range. If missing, this information will be drawn from the constructor's chromosome or strand arguments. In addition, the feature, exon, transcript, gene and symbol data can be provided as columns in the data.frame. The above comments about potential default values also apply here.

A character scalar:

in this case the value of the range argument is considered to be a file path to an annotation file on disk. A range of file types are supported by the Gviz package as identified by the file extension. See the importFunction documentation below for further details.

rstarts

An integer vector of the start coordinates for the actual gene model items, i.e., for the individual exons. The relationship between exons is handled via the gene and transcript factors. Alternatively, this can be a vector of comma-separated lists of integer coordinates, one vector item for each transcript, and each comma-separated element being the start location of a single exon within that transcript. Those lists will be exploded upon object instantiation and all other annotation arguments will be recycled accordingly to regenerate the exon/transcript/gene relationship structure. This implies the approriate number of items in all annotation and coordinates arguments.

rends

An integer vector of the end coordinates for the actual gene model items. Both rstarts and rends have to be of equal length.

rwidths

An integer vector of widths for the actual gene model items. This can be used instead of either rstarts or rends to specify the range coordinates.

strand

Character vector, the strand information for the individual track exons. It may be provided in the form + for the Watson strand, - for the Crick strand or * for either one of the two. Please note that all items within a single gene or transcript model need to be on the same strand, and erroneous entries will result in casting of an error.

feature

Factor (or other vector that can be coerced into one), giving the feature types for the individual track exons. When plotting the track to the device, if a display parameter with the same name as the value of feature is set, this will be used as the track item's fill color. Additionally, the feature type defines whether an element in the GeneRegionTrack is considered to be coding or non-coding. The details section as well as the section about the thinBoxFeature display parameter further below has more information on this. See also grouping for details.

exon

Character vector of exon identifiers. It's values will be used as the identifier tag when plotting to the device if the display parameter showExonId=TRUE.

transcript

Factor (or other vector that can be coerced into one), giving the transcript memberships for the individual track exons. All items with the same transcript identifier will be visually connected when plotting to the device. See grouping for details. Will be used as labels when showId=TRUE, and geneSymbol=FALSE.

gene

Factor (or other vector that can be coerced into one), giving the gene memberships for the individual track exons.

symbol

A factor with human-readable gene name aliases which will be used as labels when showId=TRUE, and geneSymbol=TRUE.

chromosome

The chromosome on which the track's genomic ranges are defined. A valid UCSC chromosome identifier if options(ucscChromosomeNames=TRUE). Please note that in this case only syntactic checking takes place, i.e., the argument value needs to be an integer, numeric character or a character of the form chrx, where x may be any possible string. The user has to make sure that the respective chromosome is indeed defined for the the track's genome. If not provided here, the constructor will try to build the chromosome information based on the available inputs, and as a last resort will fall back to the value chrNA. Please note that by definition all objects in the Gviz package can only have a single active chromosome at a time (although internally the information for more than one chromosome may be present), and the user has to call the chromosome<- replacement method in order to change to a different active chromosome. When creating a GeneRegionTrack from a TxDb object, the value of this parameter can be used to subset the data to fetch only transcripts from a single chromosome.

genome

The genome on which the track's ranges are defined. Usually this is a valid UCSC genome identifier, however this is not being formally checked at this point. If not provided here the constructor will try to extract this information from the provided inputs, and eventually will fall back to the default value of NA.

stacking

The stacking type for overlapping items of the track. One in c(hide, dense, squish, pack,full). Currently, only hide (don't show the track items, squish (make best use of the available space) and dense (no stacking at all) are implemented.

name

Character scalar of the track's name used in the title panel when plotting.

importFunction

A user-defined function to be used to import the data from a file. This only applies when the range argument is a character string with the path to the input data file. The function needs to accept an argument x containing the file path and has to return a proper GRanges object with all the necessary metadata columns set. A set of default import functions is already implemented in the package for a number of different file types, and one of these defaults will be picked automatically based on the extension of the input file name. If the extension can not be mapped to any of the existing import function, an error is raised asking for a user-defined import function via this argument. Currently the following file types can be imported with the default functions: gff, gff1, gff2, gff3, gtf.

GdObject

Object of GdObject-class.

value

Value to be set.

type

type

x

A valid track object class name, or the object itself, in which case the class is derived directly from it.

object

object

Details

A track containing all gene models in a particular region. The data are usually fetched dynamially from an online data store, but it is also possible to manully construct objects from local data. Connections to particular online data sources should be implemented as sub-classes, and GeneRegionTrack is just the commone denominator that is being used for plotting later on. There are several levels of data associated to a GeneRegionTrack:

exon level:

identifiers are stored in the exon column of the GRanges object in the range slot. Data may be extracted using the exon method.

transcript level:

identifiers are stored in the transcript column of the GRanges object. Data may be extracted using the transcript method.

gene level:

identifiers are stored in the gene column of the GRanges object, more human-readable versions in the symbol column. Data may be extracted using the gene or the symbol methods.

transcript-type level:

information is stored in the feature column of the GRanges object. If a display parameter of the same name is specified, the software will use its value for the coloring.

GeneRegionTrack objects also know about coding regions and non-coding regions (e.g., UTRs) in a transcript, and will indicate those by using different shapes (wide boxes for all coding regions, thinner boxes for non-coding regions). This is archived by setting the feature values of the object for non-coding elements to one of the options that are provided in the thinBoxFeature display parameters. All other elements are considered to be coding elements.

Value

The return value of the constructor function is a new object of class GeneRegionTrack.

Functions

  • initialize(GeneRegionTrack): Initialize.

  • ReferenceGeneRegionTrack-class: The file-based version of the GeneRegionTrack-class.

  • initialize(ReferenceGeneRegionTrack): Initialize.

  • GeneRegionTrack(): Constructor function for GeneRegionTrack-class.

  • gene(GeneRegionTrack): Extract the gene identifiers for all gene models.

  • gene(GdObject = GeneRegionTrack) <- value: Replace the gene identifiers for all gene models. The replacement value must be a character of appropriate length or another vector that can be coerced into such.

  • symbol(GeneRegionTrack): Extract the human-readble gene symbol for all gene models.

  • symbol(GdObject = GeneRegionTrack) <- value: Replace the human-readable gene symbol for all gene models. The replacement value must be a character of appropriate length or another vector that can be coerced into such.

  • transcript(GeneRegionTrack): Extract the transcript identifiers for all transcripts in the gene models.

  • transcript(GdObject = GeneRegionTrack) <- value: Replace the transcript identifiers for all transcripts in the gene model. The replacement value must be a character of appropriate length or another vector that can be coerced into such.

  • exon(GeneRegionTrack): Extract the exon identifiers for all exons in the gene models.

  • exon(GdObject = GeneRegionTrack) <- value: replace the exon identifiers for all exons in the gene model. The replacement value must be a character of appropriate length or another vector that can be coerced into such.

  • group(GeneRegionTrack): extract the group membership for all track items.

  • group(GdObject = GeneRegionTrack) <- value: replace the grouping information for track items. The replacement value must be a factor of appropriate length or another vector that can be coerced into such.

  • identifier(GeneRegionTrack): return track item identifiers. Depending on the setting of the optional argument lowest, these are either the group identifiers or the individual item identifiers. export

  • identifier(GdObject = GeneRegionTrack) <- value: Set the track item identifiers. The replacement value has to be a character vector of appropriate length. This always replaces the group-level identifiers, so essentially it is similar to ⁠groups<-⁠.

  • subset(ReferenceGeneRegionTrack): Subset a GeneRegionTrack by coordinates and sort if necessary.

  • drawGD(GeneRegionTrack): plot the object to a graphics device. The return value of this method is the input object, potentially updated during the plotting operation. Internally, there are two modes in which the method can be called. Either in 'prepare' mode, in which case no plotting is done but the object is preprocessed based on the available space, or in 'plotting' mode, in which case the actual graphical output is created. Since subsetting of the object can be potentially costly, this can be switched off in case subsetting has already been performed before or is not necessary.

  • show(GeneRegionTrack): Show method.

  • show(ReferenceGeneRegionTrack): Show method.

Objects from the class

Objects can be created using the constructor function GeneRegionTrack.

Author(s)

Florian Hahne, Steve Lianoglou

See Also

DisplayPars

GdObject

GRanges

HighlightTrack

ImageMap

IRanges

RangeTrack

DataTrack

collapsing

grouping

panel.grid

plotTracks

settings

Examples



## The empty object
GeneRegionTrack()

## Load some sample data
data(cyp2b10)

## Construct the object
grTrack <- GeneRegionTrack(
    start = 26682683, end = 26711643,
    rstart = cyp2b10$start, rends = cyp2b10$end, chromosome = 7, genome = "mm9",
    transcript = cyp2b10$transcript, gene = cyp2b10$gene, symbol = cyp2b10$symbol,
    feature = cyp2b10$feature, exon = cyp2b10$exon,
    name = "Cyp2b10", strand = cyp2b10$strand
)

## Directly from the data.frame
grTrack <- GeneRegionTrack(cyp2b10)

## From a TxDb object
if (require(GenomicFeatures)) {
    samplefile <- system.file("extdata",
                              "hg19_knownGene_sample.sqlite",
                              package = "GenomicFeatures")
    txdb <- loadDb(samplefile)
    GeneRegionTrack(txdb)
    GeneRegionTrack(txdb, chromosome = "chr6", start = 35000000, end = 40000000)
}


## Plotting
plotTracks(grTrack)

## Track names
names(grTrack)
names(grTrack) <- "foo"
plotTracks(grTrack)

## Subsetting and splitting
subTrack <- subset(grTrack, from = 26700000, to = 26705000)
length(subTrack)
subTrack <- grTrack[transcript(grTrack) == "ENSMUST00000144140"]
split(grTrack, transcript(grTrack))

## Accessors
start(grTrack)
end(grTrack)
width(grTrack)
position(grTrack)
width(subTrack) <- width(subTrack) + 100

strand(grTrack)
strand(subTrack) <- "-"

chromosome(grTrack)
chromosome(subTrack) <- "chrX"

genome(grTrack)
genome(subTrack) <- "hg19"

range(grTrack)
ranges(grTrack)

## Annotation
identifier(grTrack)
identifier(grTrack, "lowest")
identifier(subTrack) <- "bar"

feature(grTrack)
feature(subTrack) <- "foo"

exon(grTrack)
exon(subTrack) <- letters[1:2]

gene(grTrack)
gene(subTrack) <- "bar"

symbol(grTrack)
symbol(subTrack) <- "foo"

transcript(grTrack)
transcript(subTrack) <- c("foo", "bar")
chromosome(subTrack) <- "chr7"
plotTracks(subTrack)

values(grTrack)

## Grouping
group(grTrack)
group(subTrack) <- "Group 1"
transcript(subTrack)
plotTracks(subTrack)

## Collapsing transcripts
plotTracks(grTrack,
    collapseTranscripts = TRUE, showId = TRUE,
    extend.left = 10000, shape = "arrow"
)

## Stacking
stacking(grTrack)
stacking(grTrack) <- "dense"
plotTracks(grTrack)

## coercion
as(grTrack, "data.frame")
as(grTrack, "UCSCData")

## HTML image map
coords(grTrack)
tags(grTrack)
grTrack <- plotTracks(grTrack)$foo
coords(grTrack)
tags(grTrack)

ivanek/Gviz documentation built on Nov. 20, 2023, 8:16 p.m.