Accessing/modifying sequence information

Description

A set of generic functions for getting/setting/modifying the sequence information stored in an object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
seqinfo(x)
seqinfo(x, new2old=NULL, force=FALSE) <- value

seqnames(x)
seqnames(x) <- value

seqlevels(x)
seqlevels(x, force=FALSE) <- value
sortSeqlevels(x, X.is.sexchrom=NA)
seqlevelsInUse(x)
seqlevels0(x)

seqlengths(x)
seqlengths(x) <- value

isCircular(x)
isCircular(x) <- value

genome(x)
genome(x) <- value

Arguments

x

The object from/on which to get/set the sequence information.

new2old

The new2old argument allows the user to rename, drop, add and/or reorder the "sequence levels" in x.

new2old can be NULL or an integer vector with one element per row in Seqinfo object value (i.e. new2old and value must have the same length) describing how the "new" sequence levels should be mapped to the "old" sequence levels, that is, how the rows in value should be mapped to the rows in seqinfo(x). The values in new2old must be >= 1 and <= length(seqinfo(x)). NAs are allowed and indicate sequence levels that are being added. Old sequence levels that are not represented in new2old will be dropped, but this will fail if those levels are in use (e.g. if x is a GRanges object with ranges defined on those sequence levels) unless force=TRUE is used (see below).

If new2old=NULL, then sequence levels can only be added to the existing ones, that is, value must have at least as many rows as seqinfo(x) (i.e. length(values) >= length(seqinfo(x))) and also seqlevels(values)[seq_len(length(seqlevels(x)))] must be identical to seqlevels(x).

force

Force dropping sequence levels currently in use. This is achieved by dropping the elements in x where those levels are used (hence typically reducing the length of x).

Note that if x is a list-like object (e.g. GRangesList, GAlignmentPairs, or GAlignmentsList), then any list element in x where at least one of the sequence levels to drop is used is fully dropped. In other words, the seqlevels setter always keeps or drops full list elements and never tries to change their content. This guarantees that the geometry of the list elements is preserved, which is a desirable property when they represent compound features (e.g. exons grouped by transcript or paired-end reads). See below for an example.

value

Typically a Seqinfo object for the seqinfo setter.

Either a named or unnamed character vector for the seqlevels setter.

A vector containing the sequence information to store for the other setters.

X.is.sexchrom

A logical indicating whether X refers to the sexual chromosome or to chromosome with Roman Numeral X. If NA, sortSeqlevels does its best to "guess".

Details

The Seqinfo class plays a central role for the functions described in this man page because:

  • All these functions (except seqinfo, seqlevelsInUse, and seqlevels0) work on a Seqinfo object.

  • For classes that implement it, the seqinfo getter should return a Seqinfo object.

  • Default seqlevels, seqlengths, isCircular, and genome getters and setters are provided. By default, seqlevels(x) does seqlevels(seqinfo(x)), seqlengths(x) does seqlengths(seqinfo(x)), isCircular(x) does isCircular(seqinfo(x)), and genome(x) does genome(seqinfo(x)). So any class with a seqinfo getter will have all the above getters work out-of-the-box. If, in addition, the class defines a seqinfo setter, then all the corresponding setters will also work out-of-the-box.

    Examples of containers that have a seqinfo getter and setter: the GRanges and GRangesList classes in the GenomicRanges package; the SummarizedExperiment class in the SummarizedExperiment package; the GAlignments, GAlignmentPairs, and GAlignmentsList classes in the GenomicAlignments package; the TxDb class in the GenomicFeatures package; the BSgenome class in the BSgenome package; etc...

The GenomicRanges package defines seqinfo and seqinfo<- methods for these low-level data types: List, RangesList and RangedData. Those objects do not have the means to formally store sequence information. Thus, the wrappers simply store the Seqinfo object within metadata(x). Initially, the metadata is empty, so there is some effort to generate a reasonable default Seqinfo. The names of any List are taken as the seqnames, and the universe of RangesList or RangedData is taken as the genome.

Note

The full list of methods defined for a given generic can be seen with e.g. showMethods("seqinfo") or showMethods("seqnames") (for the getters), and showMethods("seqinfo<-") or showMethods("seqnames<-") (for the setters aka replacement methods). Please be aware that this shows only methods defined in packages that are currently attached.

Author(s)

H. Pag├Ęs

See Also

  • The seqlevelsStyle generic getter and setter.

  • Seqinfo objects.

  • GRanges and GRangesList objects in the GenomicRanges package.

  • SummarizedExperiment objects in the SummarizedExperiment package.

  • GAlignments, GAlignmentPairs, and GAlignmentsList objects in the GenomicAlignments package.

  • TxDb objects in the GenomicFeatures package.

  • BSgenome objects in the BSgenome package.

  • seqlevels-wrappers for convenience wrappers to the seqlevels getter and setter.

  • rankSeqlevels, on which sortSeqlevels is based.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
## ---------------------------------------------------------------------
## A. MODIFY THE SEQLEVELS OF AN OBJECT
## ---------------------------------------------------------------------
## Overlap and matching operations between objects require matching
## seqlevels. Often the seqlevels in one must be modified to match 
## the other. The seqlevels() function can rename, drop, add and reorder 
## seqlevels of an object. Examples below are shown on TxDb 
## and GRanges but the approach is the same for all objects that have
## a 'Seqinfo' class.

library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
seqlevels(txdb)

## Rename:
seqlevels(txdb) <- sub("chr", "", seqlevels(txdb))
seqlevels(txdb)

seqlevels(txdb) <- paste0("CH", seqlevels(txdb))
seqlevels(txdb)

seqlevels(txdb)[seqlevels(txdb) == "CHM"] <- "M"
seqlevels(txdb)

gr <- GRanges(rep(c("chr2", "chr3", "chrM"), 2), IRanges(1:6, 10))

## Add:
seqlevels(gr) <- c("chr1", seqlevels(gr), "chr4")
seqlevels(gr)
seqlevelsInUse(gr)

## Reorder:
seqlevels(gr) <- rev(seqlevels(gr))
seqlevels(gr)

## Drop all unused seqlevels:
seqlevels(gr) <- seqlevelsInUse(gr)

## Drop some seqlevels in use:
seqlevels(gr, force=TRUE) <- setdiff(seqlevels(gr), "chr3")
gr

## Rename/Add/Reorder:
seqlevels(gr) <- c("chr1", chr2="chr2", chrM="Mitochondrion")
seqlevels(gr)

## ---------------------------------------------------------------------
## B. DROP SEQLEVELS OF A LIST-LIKE OBJECT
## ---------------------------------------------------------------------

grl0 <- GRangesList(GRanges("chr2", IRanges(3:2, 5)),
                    GRanges("chr5", IRanges(11, 18)),
                    GRanges(c("chr4", "chr2"), IRanges(7:6, 15)))
grl0

grl1 <- grl0
seqlevels(grl1, force=TRUE) <- c("chr2", "chr5")
grl1  # grl0[[3]] was fully dropped!

## If what is desired is to drop the first range in grl0[[3]] only, or,
## more generally speaking, to drop the ranges within each list element
## that are located on one of the seqlevels to drop, then do:
grl2 <- grl0[seqnames(grl0) %in% c("chr2", "chr5")]
grl2

## Note that the above subsetting doesn't drop any seqlevel:
seqlevels(grl2)

## To drop them (no need to use 'force=TRUE' anymore):
seqlevels(grl2) <- c("chr2", "chr5")
seqlevels(grl2)

## ---------------------------------------------------------------------
## C. SORT SEQLEVELS IN "NATURAL" ORDER
## ---------------------------------------------------------------------

sortSeqlevels(c("11", "Y", "1", "10", "9", "M", "2"))

seqlevels <- c("chrXI", "chrY", "chrI", "chrX", "chrIX", "chrM", "chrII")
sortSeqlevels(seqlevels)
sortSeqlevels(seqlevels, X.is.sexchrom=TRUE)
sortSeqlevels(seqlevels, X.is.sexchrom=FALSE)

seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet",
               "chrM", "chrXHet", "chr2LHet", "chrU",
               "chr3L", "chr3R", "chr2R", "chrX")
sortSeqlevels(seqlevels)

gr <- GRanges()
seqlevels(gr) <- seqlevels
sortSeqlevels(gr)

## ---------------------------------------------------------------------
## D. SUBSET OBJECTS BY SEQLEVELS
## ---------------------------------------------------------------------

tx <- transcripts(txdb)
seqlevels(tx)

## Drop 'M', keep all others.
seqlevels(tx, force=TRUE) <- seqlevels(tx)[seqlevels(tx) != "M"]
seqlevels(tx)

## Drop all except 'ch3L' and 'ch3R'.
seqlevels(tx, force=TRUE) <- c("ch3L", "ch3R")
seqlevels(tx)

## ---------------------------------------------------------------------
## E. RESTORE ORIGINAL SEQLEVELS OF A TxDb OBJECT
## ---------------------------------------------------------------------

## Applicable to TxDb objects only.
## Not run: 
seqlevels(txdb) <- seqlevels0(txdb)
seqlevels(txdb)

## End(Not run)

## ---------------------------------------------------------------------
## F. FINDING METHODS
## ---------------------------------------------------------------------

showMethods("seqinfo")
showMethods("seqinfo<-")

showMethods("seqnames")
showMethods("seqnames<-")

showMethods("seqlevels")
showMethods("seqlevels<-")

if (interactive()) {
  library(GenomicRanges)
  ?`GRanges-class`
}