readPathways: Parse GMT file and return pathways as list
In BaderLab/netDx: Network-based patient classifier

Description Usage Arguments Details Value Examples

Parse GMT file and return pathways as list

readPathways(
  fname,
  MIN_SIZE = 10L,
  MAX_SIZE = 200L,
  EXCLUDE_KEGG = TRUE,
  IDasName = FALSE,
  verbose = TRUE,
  getOrigNames = FALSE
)

`fname`	(char) path to pathway file in gmt format pathway score to include pathway in the filter list
`MIN_SIZE`	(integer) min num genes allowed in a pathway. Pathways with fewer number of genes are excluded from the output list
`MAX_SIZE`	(integer) max num genes allowed in a pathway. Pathways with gene counts greater than this are excluded from the output list
`EXCLUDE_KEGG`	(boolean) If TRUE exclude KEGG pathways. Our experience has been that some KEGG gene sets are to broad to be physiologically relevant
`IDasName`	(boolean) Value for key in output list. If TRUE, uses db name and ID as name (e.g. KEGG:hsa04940) If FALSE, pathway name.
`verbose`	(logical) print detailed messages
`getOrigNames`	(logical) when TRUE also returns a mapping of the cleaned pathway names to the original names

The GMT file format currently supported should match the ones found at http://downloads.baderlab.org. The original GMT file format is: <set name><set description><member 1><member 2>...<member N>, one row per set, with values tab-delimited. The version at baderlab.org has additional unique formatting of the <set name> column as follows: <pathway_full_name> This function requires the specific formatting of the first column to assign the key name of the output list (see useIDasName argument).

Depends on value of getOrigNames. If FALSE (Default), list with pathway name as key, vector of genes as value. If TRUE, returns list of length two, (1) geneSets: pathway-gene mappings as default, (2) pNames: data.frame with original and cleaned names.