An function to perform the runseq2pathway algorithm(s).

Share:

Description

A wrapper function to perform seq2gene and gene2pathway in series.

Usage

1
2
3
4
5
6
7
8
runseq2pathway(inputfile,
               search_radius=150000, promoter_radius=200, promoter_radius2=100, 
               genome=c("hg38","hg19","mm10","mm9"), adjacent=FALSE, SNP= FALSE, 
               PromoterStop=FALSE, NearestTwoDirection=TRUE,UTR3=FALSE,
               DataBase=c("GOterm"), FAIMETest=FALSE, FisherTest=TRUE,
               collapsemethod=c("MaxMean","function","ME",
               "maxRowVariance","MinMean","absMinMean","absMaxMean","Average"), 
               alpha=5, logCheck=FALSE, B=100, na.rm=FALSE, min_Intersect_Count=5)

Arguments

inputfile

An R object input file that records genomic region information (coordinates). The file format could be data frame defined as:

  1. column 1 the unique IDs of genomic regions of interest (peaks, mutations, or SNPs)

  2. column 2 the chromosome IDs (eg. chr5 or 5)

  3. column 3 the start of genomic regions

  4. column 4 the end of genomic regions (for SNP and point mutations, the difference of start and end is 1bp)

  5. column 5... Other custom defined information (option)

Or, the input format should be GRanges object(from R package GenomicRanges) with value column.

  1. column 1: space the chromosome IDs (eg. chr5 or 5)

  2. column 2: ranges the ranges of genomic regions

  3. column 3: name the unique IDs of genomic regions of interest (peaks, mutations, or SNPs)

  4. more columns: Other custom defined information (optional)

search_radius

A non-negative integer, with which the input genomic regions can be assigned not only to the matched or nearest gene, but also with all genes within a search radius for some genomic region type. This parameter works only when the parameter "SNP" is FALSE. Default is 150000.

promoter_radius

A non-negative integer. Default is 200. Promoters are here defined as upstream regions of the transcription start sites (TSS). User can assign the promoter radius, a suggested value is between 200 to 2000.

promoter_radius2

A non-negative integer. Default is 100. Promoters are here defined as downstream regions after the transcription start sites (TSS).

genome

A character specifies the genome type. Currently, choice of "hg38", "hg19", "mm10", and "mm9" is supported.

adjacent

A Boolean. Default is FALSE to search all genes within the search_radius. Using "TRUE" to find the adjacent genes only and ignore the parameters "SNP" and "search_radius".

SNP

A Boolean specifies the input object type. FALSE by default to keep on searching for intron and neighboring genes. Otherwise, runseq2gene stops searching when the input genomic region is residing on exon of a coding gene.

PromoterStop

A Boolean, "FALSE" by default to keep on searching neighboring genes using the parameter "search_radius". Otherwise, runseq2gene stops searching neighboring genes. This parameter has function only if an input genomic region maps to promoter of coding gene(s).

NearestTwoDirection

A boolean, "TRUE" by default to output the closest left and closest right coding genes with directions. Otherwise, output only the nearest coding gene regardless of direction.

UTR3

A boolean, "FALSE" by defalt to calculate the distance from genes' 5UTR. Otherwsie, calculate the distance from genes' 3UTR.

DataBase

A character string assigns an R GSA.genesets object to define gene-set. User can call GSA.read.gmt to load customized gene-sets with a .gmt format. If not specified, a character "GOterm" by default, three categories of GO-defined gene sets (BP,MF,CC) will be used. Alternatively, user can specify a category by the choice of "BP","MF","CC".

FAIMETest

A boolean values. By default is FALSE. When true, executes function of gene2pathway test using the FAIME method, which only functions when the fifth column of input file exsists and is a vector of scores or values.

FisherTest

A Boolean value. By default is TRUE to excute the function of the Fisher's exact test. Otherwise, only excutes the function of gene2pathway test.

collapsemethod

A character for determining which method to use when call the function collapseRows in package WGCNA. The function "collapsemethod" uses this paramter to call the collapseRows() function in package "WGCNA".

alpha

A positive integer, 5 by default. This is a FAIME-specific parameter. A higher value puts more weights on the most highly-expressed ranks than the lower expressed ranks.

logCheck

A Boolean value. By default is FALSE. When true, the function takes the log-transformed values of gene if the maximum value of sample profile is larger than 20.

na.rm

A Boolean value indicates whether to keep missing values or not when method="FAIME". By default is FALSE.

B

A positive integer assigns the total number of random sampling trials to calculate the empirical pvalues. By default is 100.

min_Intersect_Count

A number decides the cutoff of the minimum number of intersected genes when reporting Fisher's exact tested results.

Value

An R list of several data frames. The results of function seq2gene, Fisher's exact test and gene2pathway test results are included.

Author(s)

Bin Wang, Xinan Yang

References

Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9:559.

Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, Salomon DR, Horvath S (2011) Strategies for aggregating gene expression data: The collapseRows R function. BMC Bioinformatics, 12:322.

Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, Gentleman R, Morgan M and Carey V (2013) "Software for Computing and Annotating Genomic Ranges.". PLoS Computational Biology, 9.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
	data(Chipseq_Peak_demo)
	require(seq2pathway.data)
	data(MsigDB_C5, package="seq2pathway.data")
  #generate a demo GSA.genesets object
	demoDB <- MsigDB_C5
	x=10
	for(i in 1:3) demoDB[[i]]<-MsigDB_C5[[i]][1:x]
       res3=runseq2pathway(inputfile=Chipseq_Peak_demo,
		genome="hg19", search_radius=100, promoter_radius=50, promoter_radius2=0,
		FAIMETest=TRUE, FisherTest=FALSE,  
		DataBase=demoDB, min_Intersect_Count=1)	
	names(res3)
	res3[[1]]
  ## Not run: 
   # an example to use FET
	res=runseq2pathway(inputfile=Chipseq_Peak_demo,
		genome="hg19", search_radius=100, promoter_radius=50, promoter_radius2=0,
		DataBase=MsigDB_C5, NearestTwoDirection=FALSE,
		collapsemethod="Average", min_Intersect_Count=1) 
   # an example to use FAIME
	res2=runseq2pathway(inputfile=Chipseq_Peak_demo,
		genome="hg19", search_radius=100, promoter_radius=50, promoter_radius2=0,
		FAIMETest=TRUE, FisherTest=FALSE,  
		DataBase=MsigDB_C5, min_Intersect_Count=1)
	
## End(Not run)