getObjects: Read and write gene sets from Broad or GMT formats

import/exportR Documentation

Read and write gene sets from Broad or GMT formats

Description

getBroadSets parses one or more XML files for gene sets. The file can reside locally or at a URL. The format followed is that defined by the Broad (below). toBroadXML creates Broad XML from BroadCollection gene sets.

toGmt converts GeneSetColletion objects to a character vector representing the gene set collection in GMT format. getGmt reads a GMT file or other character vector into a GeneSetColletion.

Usage

getBroadSets(uri, ..., membersId=c("MEMBERS_SYMBOLIZED", "MEMBERS_EZID"))
toBroadXML(geneSet, con, ...)
asBroadUri(name,
           base="http://www.broad.mit.edu/gsea/msigdb/cards")
getGmt(con, geneIdType=NullIdentifier(),
       collectionType=NullCollection(), sep="\t", ...)
toGmt(x, con, ...)

Arguments

uri

A file name or URL containing gene sets encoded following the Broad specification. For Broad sets, the uri can point to a MSIGDB.

geneSet

A GeneSet with collectionType BroadCollection (to ensure that required information is available).

x

A GeneSetCollection or other object for which a toGmt method is defined.

con

A (optional, in the case of toXxx) file name or connection to receive output.

name

A character vector of Broad gene set names, e.g., c('chr16q', 'GNF2_TNFSF10').

base

Base uri for finding Broad gene sets.

geneIdType

A constructor for the type of identifier the members of the gene sets represent. See GeneIdentifierType for more information.

collectionType

A constructor for the type of collection for the gene sets. See CollectionType for more information.

sep

The character string separating members of each gene set in the GMT file.

...

Further arguments passed to the underlying XML parser, particularly file used to specify an output connection for toBroadXML.

membersId

XML field name from which geneIds are derived. Choose one value; default “MEMBERS_SYMBOLIZED”.

Value

getBroadSets returns a GeneSetCollection of gene sets.

toBroadXML returns a character vector of a single GeneSet or, if con is provided, writes the XML to a file.

asBroadUri can be used to create URI names (to be used by getBroadSets of Broad files.

getGmt returns a GeneSetCollection of gene sets.

toGmt returns character vectors where each line represents a gene set. If con is provided, the result is written to the specified connection.

Note

Actual Broad XML files differ from the DTD (e.g., an implied ',' separator between genes in a set); we parse to and from files as they exists the actual files.

Author(s)

Martin Morgan <mtmrogan@fhcrc.org>

References

http://www.broad.mit.edu/gsea/

See Also

GeneSetCollection GeneSet

Examples

## 'fl' could also be a URI
fl <- system.file("extdata", "Broad.xml", package="GSEABase")
gss <- getBroadSets(fl) # GeneSetCollection of 2 sets
names(gss)
gss[[1]]

## Not run: 
## Download 'msigdb_v2.5.xml' or 'c3.all.v2.5.symbols.gmt' from the
## Broad, http://www.broad.mit.edu/gsea/downloads.jsp#msigdb, then
gsc <- getBroadSets("/path/to/msigdb_v.2.5.xml")
types <- sapply(gsc, function(elt) bcCategory(collectionType(elt)))
c3gsc1 <- gsc[types == "c3"]
c3gsc2 <- getGmt("/path/to/c3.all.v2.5.symbols.gmt",
                 collectionType=BroadCollection(category="c3"),
                 geneIdType=SymbolIdentifier())

## End(Not run)

fl <- tempfile()
toBroadXML(gss[[1]], con=fl)
noquote(readLines(fl))
unlink(fl)

## Not run: 
toBroadXML(gss[[1]]) # character vector

## End(Not run)

fl <- tempfile()
toGmt(gss, fl)
getGmt(fl)
unlink(fl)

Bioconductor/GSEABase documentation built on Nov. 2, 2024, 6:35 a.m.