GRanges-class | R Documentation |
The GRanges class is a container for the genomic locations and their associated annotations.
GRanges is a vector of genomic locations and associated annotations. Each element in the vector is comprised of a sequence name, an interval, a strand, and optional metadata columns (e.g. score, GC content, etc.). This information is stored in four components:
seqnames
a 'factor' Rle object containing the sequence names.
ranges
an IRanges object containing the ranges.
strand
a 'factor' Rle object containing the strand information.
mcols
a DataFrame object
containing the metadata columns. Columns cannot be named
"seqnames"
, "ranges"
, "strand"
,
"seqlevels"
, "seqlengths"
, "isCircular"
,
"start"
, "end"
, "width"
, or "element"
.
seqinfo
a Seqinfo object containing information about the set of genomic sequences present in the GRanges object.
GRanges(seqnames=NULL, ranges=NULL, strand=NULL,
..., seqinfo=NULL, seqlengths=NULL)
:Creates a GRanges object.
seqnames
NULL
, or an Rle object, character vector,
or factor containing the sequence names.
ranges
NULL
, or an IRanges object containing the
ranges.
strand
NULL
, or an Rle object, character vector,
or factor containing the strand information.
...
Metadata columns to set on the GRanges object. All the metadata
columns must be vector-like objects of the same length as the object
to construct. They cannot be named "start"
, "end"
,
"width"
, or "element"
.
seqinfo
Either NULL
, or a Seqinfo object,
or a character vector of unique sequence names (a.k.a.
seqlevels), or a named numeric vector of sequence lengths.
When not NULL
, seqinfo
must be compatible with the
sequence names in seqnames
, that is, it must have one entry
for each unique sequence name in seqnames
. Note that it can
have additional entries i.e. entries for seqlevels not present
in seqnames
.
seqlengths
NULL
, or an integer vector named with levels(seqnames)
and containing the lengths (or NA) for each level in
levels(seqnames)
.
If ranges
is not supplied and/or NULL then the constructor
proceeds in 2 steps:
An initial GRanges object is created with
as(seqnames, "GRanges")
.
Then this GRanges object is updated according to whatever
non-NULL remaining arguments were passed to the call to
GRanges()
.
As a consequence of this behavior, GRanges(x)
is equivalent to
as(x, "GRanges")
.
In the following code snippets, x
is a GRanges object.
length(x)
:Get the number of elements.
seqnames(x)
, seqnames(x) <- value
:Get or set the sequence names.
value
can be an Rle object, a character vector,
or a factor.
ranges(x)
, ranges(x) <- value
:Get or set the ranges. value
can be an
IntegerRanges object.
start(x)
, start(x) <- value
:Get or set start(ranges(x))
.
end(x)
, end(x) <- value
:Get or set end(ranges(x))
.
width(x)
, width(x) <- value
:Get or set width(ranges(x))
.
strand(x)
, strand(x) <- value
:Get or set the strand. value
can be an Rle object, character
vector, or factor.
names(x)
, names(x) <- value
:Get or set the names of the elements.
mcols(x, use.names=FALSE)
, mcols(x) <- value
:Get or set the metadata columns.
If use.names=TRUE
and the metadata columns are not NULL
,
then the names of x
are propagated as the row names of the
returned DataFrame object.
When setting the metadata columns, the supplied value must be NULL
or a data-frame-like object (i.e. DataFrame or data.frame)
holding element-wise metadata.
elementMetadata(x)
, elementMetadata(x) <- value
,
values(x)
, values(x) <- value
:Alternatives to mcols
functions. Their use is discouraged.
seqinfo(x)
, seqinfo(x) <- value
:Get or set the information about the underlying sequences.
value
must be a Seqinfo object.
seqlevels(x)
,
seqlevels(x, pruning.mode=c("error", "coarse", "fine", "tidy")) <- value
:Get or set the sequence levels.
seqlevels(x)
is equivalent to seqlevels(seqinfo(x))
or to levels(seqnames(x))
, those 2 expressions being
guaranteed to return identical character vectors on a GRanges object.
value
must be a character vector with no NAs.
See ?seqlevels
for more information.
seqlengths(x)
, seqlengths(x) <- value
:Get or set the sequence lengths.
seqlengths(x)
is equivalent to seqlengths(seqinfo(x))
.
value
can be a named non-negative integer or numeric vector
eventually with NAs.
isCircular(x)
, isCircular(x) <- value
:Get or set the circularity flags.
isCircular(x)
is equivalent to isCircular(seqinfo(x))
.
value
must be a named logical vector eventually with NAs.
genome(x)
, genome(x) <- value
:Get or set the genome identifier or assembly name for each sequence.
genome(x)
is equivalent to genome(seqinfo(x))
.
value
must be a named character vector eventually with NAs.
seqlevelsStyle(x)
, seqlevelsStyle(x) <- value
:Get or set the seqname style for x
.
See the seqlevelsStyle generic getter and setter
in the GenomeInfoDb package for more information.
score(x), score(x) <- value
:Get or set the “score” column from the element metadata.
granges(x, use.names=FALSE, use.mcols=FALSE)
: Squeeze the genomic
ranges out of GenomicRanges object x
and return them in a
GRanges object parallel to x
(i.e. same length as x
).
If use.mcols
is TRUE
, the metadata columns are propagated.
If x
is a GenomicRanges derivative with extra column
slots, these will be propagated as metadata columns on the returned
GRanges object.
In the code snippets below, x
is a GRanges object.
as(from, "GRanges")
:Creates a GRanges object from a character vector, a factor, or IntegerRangesList object.
When from
is a character vector (or a factor), each element
in it must represent a genomic range in format chr1:2501-2800
(unstranded range) or chr1:2501-2800:+
(stranded range).
..
is also supported as a separator between the start and end
positions. Strand can be +
, -
, *
, or missing.
The names on from
are propagated to the returned GRanges object.
See as.character()
and as.factor()
below for the
reverse transformations.
Coercing a data.frame or DataFrame into a GRanges object is also
supported. See makeGRangesFromDataFrame
for the details.
as(from, "IntegerRangesList")
:Creates a IntegerRangesList object from a GRanges
object. The strand
and metadata columns become inner
metadata columns (i.e. metadata columns on the ranges).
The seqlengths(from)
, isCircular(from)
, and
genome(from)
vectors become the metadata columns.
as.character(x, ignore.strand=FALSE)
:Turn GRanges object x
into a character vector where each
range in x
is represented by a string in format
chr1:2501-2800:+
. If ignore.strand
is TRUE or if
all the ranges in x
are unstranded (i.e. their strand
is set to *
), then all the strings in the output are in
format chr1:2501-2800
.
The names on x
are propagated to the returned character vector.
Its metadata (metadata(x)
) and metadata columns (mcols(x)
)
are ignored.
See as(from, "GRanges")
above for the reverse transformation.
as.factor(x)
:Equivalent to
factor(as.character(x), levels=as.character(sort(unique(x))))
See as(from, "GRanges")
above for the reverse transformation.
Note that table(x)
is supported on a GRanges object. It is
equivalent to, but much faster than, table(as.factor(x))
.
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
:Creates a data.frame with columns seqnames
(factor),
start
(integer), end
(integer), width
(integer),
strand
(factor), as well as the additional metadata columns
stored in mcols(x)
. Pass an explicit
stringsAsFactors=TRUE/FALSE
argument via ...
to
override the default conversions for the metadata columns in
mcols(x)
.
as(from, "Grouping")
: Creates a
ManyToOneGrouping
object that groups
from
by seqname, strand, start and end (same as the default
sort order). This makes it convenient, for example, to aggregate a
GenomicRanges object by range.
In the code snippets below, x
is a Seqinfo
object.
as(x, "GRanges")
, as(x, "GenomicRanges")
,
as(x, "IntegerRangesList")
: Turns Seqinfo
object x
(with no NA
lengths) into a GRanges or
IntegerRangesList.
In the code snippets below, x
is a GRanges object.
x[i]
:Return a new GRanges object made of the elements selected by i
.
x[i, j]
:Like the above, but allow the user to conveniently subset the metadata
columns thru j
.
x[i] <- value
:Replacement version of x[i]
.
x$name
, x$name <- value
:Shortcuts for mcols(x)$name
and mcols(x)$name <- value
,
respectively. Provided as a convenience, for GRanges objects *only*,
and as the result of strong popular demand.
Note that those methods are not consistent with the other $
and $<-
methods in the IRanges/GenomicRanges infrastructure,
and might confuse some users by making them believe that a GRanges
object can be manipulated as a data.frame-like object.
Therefore we recommend using them only interactively, and we discourage
their use in scripts or packages. For the latter, use
mcols(x)$name
and mcols(x)$name <- value
, instead
of x$name
and x$name <- value
, respectively.
See ?`[`
in the S4Vectors package for more
information about subsetting Vector derivatives and for an important note
about the x[i, j]
form.
Note that a GRanges object can be used as a subscript to subset a
list-like object that has names on it. In that case, the names on the
list-like object are interpreted as sequence names.
In the code snippets below, x
is a list or List object with
names on it, and the subscript gr
is a GRanges object with all its
seqnames being valid x
names.
x[gr]
:Return an object of the same class as x
and parallel
to gr
. More precisely, it's conceptually doing:
lapply(gr, function(gr1) x[[seqnames(gr1)]][ranges(gr1)])
c(x, ..., ignore.mcols=FALSE)
:Concatenate GRanges object x
and the GRanges objects in
...
together.
See ?c
in the S4Vectors package
for more information about concatenating Vector derivatives.
split(x, f, drop=FALSE)
:Splits GRanges object x
according to f
to create a
GRangesList object.
If f
is a list-like object then drop
is ignored
and f
is treated as if it was
rep(seq_len(length(f)), sapply(f, length))
,
so the returned object has the same shape as f
(it also
receives the names of f
).
Otherwise, if f
is not a list-like object, empty list
elements are removed from the returned object if drop
is
TRUE
.
In the code snippets below, x
is a GRanges object.
show(x)
:By default the show
method displays 5 head and 5 tail
elements. This can be changed by setting the global options
showHeadLines
and showTailLines
. If the object
length is less than (or equal to) the sum of these 2 options
plus 1, then the full object is displayed.
Note that these options also affect the display of
GAlignments and
GAlignmentPairs objects (defined in
the GenomicAlignments package), as well as other objects
defined in the IRanges and Biostrings packages (e.g.
IRanges and DNAStringSet objects).
P. Aboyoun and H. Pagès
The IRanges class in the IRanges package for storing a set of integer ranges.
The GPos class for representing a set of genomic positions (i.e. genomic ranges of width 1, a.k.a. genomic loci).
makeGRangesFromDataFrame
for making a GRanges object
from a data.frame or DataFrame object.
Seqinfo objects and the
seqinfo
accessor and family in the
GenomeInfoDb package for accessing/modifying information
about the underlying sequences of a GenomicRanges derivative.
GenomicRanges-comparison for comparing and ordering genomic ranges and/or positions.
findOverlaps-methods for finding overlapping genomic ranges and/or positions.
intra-range-methods and inter-range-methods for intra range and inter range transformations of GenomicRanges derivatives.
coverage-methods for computing the coverage of a set of genomic ranges and/or positions.
setops-methods for set operations on GRanges objects.
subtract for subtracting a set of genomic ranges from a GRanges object (similar to bedtools subtract).
nearest-methods for finding the nearest genomic range/position neighbor.
absoluteRanges
for transforming genomic ranges into
absolute ranges (i.e. into ranges on the sequence obtained
by virtually concatenating all the sequences in a genome).
tileGenome
for putting tiles on a genome.
genomicvars for manipulating genomic variables.
GRangesList objects.
Vector, Rle, and DataFrame objects in the S4Vectors package.
showClass("GRanges") # shows the known subclasses
## ---------------------------------------------------------------------
## CONSTRUCTION
## ---------------------------------------------------------------------
## Specifying the bare minimum i.e. seqnames and ranges only. The
## GRanges object will have no names, no strand information, and no
## metadata columns:
gr0 <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1))
gr0
## Specifying names, strand, metadata columns. They can be set on an
## existing object:
names(gr0) <- head(letters, 10)
strand(gr0) <- Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2))
mcols(gr0)$score <- 1:10
mcols(gr0)$GC <- seq(1, 0, length=10)
gr0
## ... or specified at construction time:
gr <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1, names=head(letters, 10)),
Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10, GC=seq(1, 0, length=10))
stopifnot(identical(gr0, gr))
## Specifying the seqinfo. It can be set on an existing object:
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
seqinfo(gr0) <- merge(seqinfo(gr0), seqinfo)
seqlevels(gr0) <- seqlevels(seqinfo)
## ... or specified at construction time:
gr <- GRanges(Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(1:10, width=10:1, names=head(letters, 10)),
Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
score=1:10, GC=seq(1, 0, length=10),
seqinfo=seqinfo)
stopifnot(identical(gr0, gr))
## ---------------------------------------------------------------------
## COERCION
## ---------------------------------------------------------------------
## From GRanges:
as.character(gr)
as.factor(gr)
as.data.frame(gr)
## From character to GRanges:
x1 <- "chr2:56-125"
as(x1, "GRanges")
as(rep(x1, 4), "GRanges")
x2 <- c(A=x1, B="chr1:25-30:-")
as(x2, "GRanges")
## From data.frame to GRanges:
df <- data.frame(chrom="chr2", start=11:15, end=20:24)
gr3 <- as(df, "GRanges")
## Alternatively, coercion to GRanges can be done by just calling the
## GRanges() constructor on the object to coerce:
gr1 <- GRanges(x1) # same as as(x1, "GRanges")
gr2 <- GRanges(x2) # same as as(x2, "GRanges")
gr3 <- GRanges(df) # same as as(df, "GRanges")
## Sanity checks:
stopifnot(identical(as(x1, "GRanges"), gr1))
stopifnot(identical(as(x2, "GRanges"), gr2))
stopifnot(identical(as(df, "GRanges"), gr3))
## ---------------------------------------------------------------------
## SUMMARIZING ELEMENTS
## ---------------------------------------------------------------------
table(seqnames(gr))
table(strand(gr))
sum(width(gr))
table(gr)
summary(mcols(gr)[,"score"])
## The number of lines displayed in the 'show' method are controlled
## with two global options:
longGR <- sample(gr, 25, replace=TRUE)
longGR
options(showHeadLines=7)
options(showTailLines=2)
longGR
## Revert to default values
options(showHeadLines=NULL)
options(showTailLines=NULL)
## ---------------------------------------------------------------------
## INVERTING THE STRAND
## ---------------------------------------------------------------------
invertStrand(gr)
## ---------------------------------------------------------------------
## RENAMING THE UNDERLYING SEQUENCES
## ---------------------------------------------------------------------
seqlevels(gr)
seqlevels(gr) <- sub("chr", "Chrom", seqlevels(gr))
gr
seqlevels(gr) <- sub("Chrom", "chr", seqlevels(gr)) # revert
## ---------------------------------------------------------------------
## COMBINING OBJECTS
## ---------------------------------------------------------------------
gr2 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 3, 4)),
IRanges(1:10, width=5),
strand='-',
score=101:110, GC=runif(10),
seqinfo=seqinfo)
gr3 <- GRanges(seqnames=Rle(c('chr1', 'chr2', 'chr3'), c(3, 4, 3)),
IRanges(101:110, width=10),
strand='-',
score=21:30,
seqinfo=seqinfo)
some.gr <- c(gr, gr2)
c(gr, gr2, gr3)
c(gr, gr2, gr3, ignore.mcols=TRUE)
## ---------------------------------------------------------------------
## USING A GRANGES OBJECT AS A SUBSCRIPT TO SUBSET ANOTHER OBJECT
## ---------------------------------------------------------------------
## Subsetting *by* a GRanges subscript is supported only if the object
## to subset is a named list-like object:
x <- RleList(chr1=101:120, chr2=2:-8, chr3=31:40)
x[gr]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.