read_gmt: Read a '.gmt' file in as a 'pathwayCollection' object

Description Usage Arguments Details Value See Also Examples

View source: R/utils_read_gmt.R

Description

Read a set list file in Gene Matrix Transposed (.gmt) format, with special performance consideration for large files. Present this object as a pathwayCollection object.

Usage

1
2
3
4
5
6
7
read_gmt(
  file,
  setType = c("pathways", "genes", "regions"),
  description = FALSE,
  nChars = 1e+07,
  delim = "\t"
)

Arguments

file

A path to a file or a connection. This file must be a .gmt file, otherwise input will likely be nonsense. See the "Details" section for more information.

setType

What is the type of the set: pathway set of gene, gene sites in RNA or DNA, or regions of CpGs. Defaults to ''pathway''.

description

Should the "description" field (the second field in the .gmt file on each line) be included in the output? Defaults to FALSE.

nChars

The number of characters to read from a connection. The largest .gmt file we have encountered is the full C5 pathway collection from MSigDB (5917 pathways), which has roughly 5 million characters in UTF-8 encoding. Therefore, we default this argument to be twice the size of the largest pathway collection we have seen so far, 10,000,000.

delim

The .gmt delimiter. As proper .gmt files are tab delimited, this defaults to "\t".

Details

This function uses R's readChar function to improve character input performance over readLines (and far improve input performance over scan).

See the Broad Institute's "Data Formats" page for a description of the Gene Matrix Transposed file format: https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#GMT:_Gene_Matrix_Transposed_file_format_.28.2A.gmt.29

Value

A pathwayCollection list of sets. This list has three elements:

See Also

print.pathwayCollection; write_gmt

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  # If you have installed the package:
  data_path <- system.file(
    "extdata", "c2.cp.v6.0.symbols.gmt",
    package = "pathwayPCA", mustWork = TRUE
  )
  geneset_ls <- read_gmt(data_path, description = TRUE)

  # # If you are using the development version from GitHub:
  # geneset_ls <- read_gmt(
  #   "inst/extdata/c2.cp.v6.0.symbols.gmt",
  #   description = TRUE
  # )

pathwayPCA documentation built on Dec. 15, 2020, 6:14 p.m.