omics2pathlist: Map individual probes into pathway

View source: R/dataProcess.R

omics2pathlistR Documentation

Map individual probes into pathway

Description

Map a set of individual probes from different omics (i.e. SNPs, gene expression probes, CpGs etc.) into pathway such as Gene Ontology (GO) categories and KEGG.

Usage

omics2pathlist(
  data,
  pathlistDB,
  featureAnno = NULL,
  restrictUp = 200,
  restrictDown = 10,
  minPathSize = 5
)

Arguments

data

The input dataset (either data.frame or matrix). Rows are the samples, columns are the probes/genes, except that the first column is the label. If it's transcriptomic data, gene ID is the 'entrezID'.

pathlistDB

A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used).

featureAnno

The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. If it's NULL, then the input probe is from transcriptomic data.

restrictUp

The upper-bound of the number of genes in each pathway. The default is 200.

restrictDown

The lower-bound of the number of genes in each pathway. The default is 10.

minPathSize

The minimal required number of probes in each pathway after mapping the input data to pathlistDB.

Details

If gene expression data is the input, then featureAnno is NULL,since the gene IDs are already defined as column names of the data. Since online database is updated from time to time, it is adivsed to make sure that the study database (e.g. pathlistDB) is frozen at particular time for reproducing the results. The number of genes in each pathway can be restricted for downstream analysis because too small pathways are sparsely distributed, and too large pathways are often computationally intensive, and likely nonspecific.

Value

A list of matrices with pathway IDs as the associated list member names. For each matrix, rows are the samples and columns are the probe names, except that the first column is named 'label'.

Examples

 
## Load data from DNA methylation
methylfile <- system.file('extdata', 'methylData.rds', package='BioMM')  
methylData <- readRDS(methylfile)  
## Annotation files for Mapping CpGs into pathways 
pathlistDBfile <- system.file('extdata', 'goDB.rds', package='BioMM')
featureAnnoFile <- system.file('extdata', 'cpgAnno.rds', package='BioMM') 
pathlistDB <- readRDS(file=pathlistDBfile)
featureAnno <- readRDS(file=featureAnnoFile)  
## To reduce runtime
pathlistDB <- pathlistDB[1:20]
## Mapping CpGs into pathway list 
dataList <- omics2pathlist(data=methylData, 
                                pathlistDB, featureAnno, 
                                restrictUp=100, restrictDown=20, 
                                minPathSize=10)
length(dataList)

transbioZI/BioMM documentation built on Jan. 12, 2023, 2:18 p.m.