Description Usage Arguments Details Value Author(s) References Examples
PAGI.Main is an attempt to identify dysregulated pathways, which are influenced by both the internal effect of pathways and crosstalk between pathways, integrating pathway topological information and differences between two biological phenotypes.
1 2 | PAGI.Main(dataset,class.labels,nperm = 100, p.val.threshold = -1, FDR.threshold = 0.01,
gs.size.threshold.min= 25, gs.size.threshold.max = 500 )
|
dataset |
A dataframe of gene expression data whose first column are genes symbols and whose names are samples. |
class.labels |
A vector of binary labels. The vector is used to distinguish the class of phenotype. |
nperm |
An integer. The number of random permutations. The default value is 100. |
p.val.threshold |
A value. The significance threshold of NOM p-value for pathways whose detail results of pathways to be presented. The default value is -1, which means no threshold. |
FDR.threshold |
A value. The significance threshold of FDR q-value for pathways whose detail results of pathways to be presented. The default value is 0.01. |
gs.size.threshold.min |
An integer. The minimum size (in genes) for pathways to be considered. The default value is 25. |
gs.size.threshold.max |
An integer. The maximum size (in genes) for pathways to be considered. The default value is 500. |
When users input interesting gene expression data and the vector of binary labels (class labels), the function can identify dysregulated pathways mainly through: (1) Mapping genes with the absolute t-score more than 0 to the global graph reconstructed based on the relationships of genes extracted from each pathway in KEGG database and the overlapped genes between pathways; (2) We defined a global influence factor (GIF) to distinguish the non-equivalence of gene influenced by both internal effect of pathways and crosstalk between pathways in the global network. The random walk with restart (RWR) algorithm was used to evaluate the GIF by integrating the global network topology and the correlation of gene with phenotype; (3) We used cumulative distribution functions (CDFs) to prioritize the dysregulated pathways. The permutation is used to identify the stasistical significance of pathways (normal p-values) and the FDR is used to to account for false positives.
The argument dataset
is gene expression data set stored in a dataframe. The first column of the dataframe are gene symbols and the names of the dataframe are samples names.
A list. It includes two elements: SummaryResult
and PathwayList
.
SummaryResult
is a dataframe. It is the summary of the result of pathways. Each rows of the dataframe represents a pathway. Its columns include "Pathway Name", "SIZE", "PathwayID", "Pathway Score", "NOM p-val", "FDR q-val", "Tag percentage" (Percent of gene set before running enrichment peak), "Gene percentage" (Percent of gene list before running enrichment peak), "Signal strength" (enrichment signal strength).
PathwayList
is list of pathways which present the detail results of pathways with NOM p-val< p.val.threshold
or FDR< FDR.threshold
. Each element of the list is a dataframe. Each rows of the dataframe represents a gene. Its columns include "Gene number in the (sorted) pathway", "gene symbol from the gene express data", "location of the gene in the sorted gene list", "the T-score of gene between two biological states", "global influence impactor", "if the gene contribute to the score of pathway".
Junwei Han <hanjunwei1981@163.com> Yanjun Xu <tonghua605@163.com> Haixiu Yang <yanghaixiu@ems.hrbmu.edu.cn> Chunquan Li <lcqbio@yahoo.com.cn> and Xia Li <lixia@hrbmu.edu.cn>
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S. et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A, 102, 15545-15550.
Li, C., Li, X., Miao, Y., Wang, Q., Jiang, W., Xu, C., Li, J., Han, J., Zhang, F., Gong, B. et al. (2009) SubpathwayMiner: a software package for flexible identification of pathways. Nucleic Acids Res, 37, e131.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | ## Not run:
##########identify dysregulated pathways by using the function PAGI.Main###########
#example 1
#get example data
dataset<-getdataset()
class.labels<-getclass.labels()
#identify dysregulated pathways
result<-PAGI.Main(dataset,class.labels,nperm = 100,p.val.threshold = -1,FDR.threshold = 0.01,
gs.size.threshold.min = 25, gs.size.threshold.max = 500 )
#print the summary results of pathways to screen
result[[1]][1:10,]
#The result is a dataframe. The rows of the dataframe are ranked by the values of False
#discovery rate (FDR). Each row of the result (dataframe) is a pathway. It columns include
#"Pathway Name", "SIZE", "PathwayID", "Pathway Score", "NOM p-val", "FDR q-val", "Tag
#percentage", "Gene percentage", "Signal strength". They correspond to pathway names,
#the number of genes which were mapped to the pathway from gene expression profiles, pathway ID,
#the scores of pathway, the nominal p-values of the pathways, the FDR values, the percent of
#gene set before running enrichment peak, the percent of gene list before running enrichment peak,
#enrichment signal strength.
#print the detail results of pathways to screen
result[[2]][1:5]
#The result is a list. Each element of the list is a dataframe whcih present the detail results of
#genes in the pathway with FDR.threshold< 0.01. Each rows of the dataframe represents a gene.
#Its columns include "Gene number in the (sorted) pathway", "gene symbol from the gene express data",
#"location of the gene in the sorted gene list", "the T-score of gene between two biological states",
#"global influence impactor", "if the gene contribute to the score of pathway".
#write the summary results of pathways to tab delimited file.
write.table(result[[1]], file = "SUMMARY RESULTS.txt", quote=F, row.names=F, sep = "\t")
#write the detail results of genes for each pathway with FDR.threshold< 0.01 to tab delimited file.
for(i in 1:length(result[[2]])){
gene.report<-result[[2]][[i]]
filename <- paste(names(result[[2]][i]),".txt", sep="", collapse="")
write.table(gene.report, file = filename, quote=F, row.names=F, sep = "\t")
}
#example 2
#get example data
dataset<-read.table(paste(system.file(package="PAGI"),"/localdata/dataset.txt",sep=""),
header=T,sep="\t","\"")
class.labels<-as.character(read.table(paste(system.file(package="PAGI"),
"/localdata/class.labels.txt",sep=""),quote="\"", stringsAsFactors=FALSE)[1,])
#identify dysregulated pathways
result<-PAGI.Main(dataset,class.labels,nperm = 100,p.val.threshold = -1,FDR.threshold = 0.01,
gs.size.threshold.min = 25, gs.size.threshold.max = 500 )
#print the summary results of pathways to screen
result[[1]][1:10,]
#The result is a dataframe. The rows of the dataframe are ranked by the values of False
#discovery rate (FDR). Each row of the result (dataframe) is a pathway. It columns include
#"Pathway Name", "SIZE", "PathwayID", "Pathway Score", "NOM p-val", "FDR q-val", "Tag
#percentage", "Gene percentage", "Signal strength". They correspond to pathway names,
#the number of genes which were mapped to the pathway from gene expression profiles, pathway ID,
#the scores of pathway, the nominal p-values of the pathways, the FDR values, the percent of
#gene set before running enrichment peak, the percent of gene list before running enrichment peak,
#enrichment signal strength.
#print the detail results of pathways to screen
result[[2]][1:5]
#The result is a list. Each element of the list is a dataframe whcih present the detail results of
#genes in the pathway with FDR.threshold< 0.01. Each rows of the dataframe represents a gene.
#Its columns include "Gene number in the (sorted) pathway", "gene symbol from the gene express data",
#"location of the gene in the sorted gene list", "the T-score of gene between two biological states",
#"global influence impactor", "if the gene contribute to the score of pathway".
#write the summary results of pathways to tab delimited file.
write.table(result[[1]], file = "SUMMARY RESULTS.txt", quote=F, row.names=F, sep = "\t")
#write the detail results of genes for each pathway with FDR.threshold< 0.01 to tab delimited file.
for(i in 1:length(result[[2]])){
gene.report<-result[[2]][[i]]
filename <- paste(names(result[[2]][i]),".txt", sep="", collapse="")
write.table(gene.report, file = filename, quote=F, row.names=F, sep = "\t")
}
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.