Class "GmapGenome"

Description

The GmapGenome class represents a genome that has been indexed for use with the GMAP suite of tools. It is typically used as a parameter to the functions gsnap and bam_tally. This class also provides the means to index new genomes, from either a FASTA file or a BSgenome object. Genome indexes are typically stored in a centralized directory on the file system and are identified by a string key.

Constructor

GmapGenome(genome, directory = GmapGenomeDirectory(create = create), name = genomeName(genome), create = FALSE, ...):

Creates a GmapGenome corresponding to the genome argument, which may be either a string identifier of the genome within directory, a FastaFile or DNAStringSet of the genome sequence, or a BSgenome object.

The genome index is stored in directory argument, which may be either a GmapGenomeDirectory object, or a string path.

The name argument is the actual key used for storing the genome index within directory. If genome is a string, it is taken as the key. If a FastaFile, it is the basename of the file without the extension. If a BSgenome, it is the providerVersion. Otherwise, the name must be specified. If create is TRUE, the genome index is created if one with that name does not already exist. This obviously only works if genome actually contains the genome sequence.

The first example below gives the typical and recommended usage when implementing a reproducible analysis.

Extracting Genomic Sequence

getSeq(x, which = seqinfo(x)): Extracts the genomic sequence for each region in which (something coercible to GRanges). The result is a character vector for now. This is implemented in C and is very efficient. The default for which will retrieve the entire genome.

Coercion

as(object, "DNAStringSet"): Extracts the entire sequence of the genome as a DNAStringSet. One consequence is that this comes possible with rtracklayer: export(object, "genome.fasta").

Accessors

path(object): returns the path to the directory containing the genome index files.

directory(x): returns the GmapGenomeDirectory that is the parent of the directory containing the index files for this genome.

genome(x): gets the name of this genome.

seqinfo(x): gets the Seqinfo for this genome; only sequence names and lengths are available.

Author(s)

Michael Lawrence

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
library(BSgenome.Dmelanogaster.UCSC.dm3)
flyGG <- GmapGenome(Dmelanogaster, create = TRUE)

## access system-wide genome using a key
flyGG <- GmapGenome(genome = "dm3")

which <- seqinfo(flyGG)["chr4"]
firstchr <- getSeq(flyGG, which)

genome(which) <- "hg19"
## should throw an error
try(getSeq(flyGG, which))

##create a GmapGenome from a FASTA file
fa <- system.file("extdata/hg19.p53.fasta", package="gmapR")
fastaFile <- rtracklayer::FastaFile(fa)
gmapGenome <- GmapGenome(fastaFile, create=TRUE)

## End(Not run)