knitr::opts_chunk$set(tidy = FALSE, warning = FALSE, message = FALSE)
This vignette illustrates how to easily use the ICDS package. This package can identifying cancer risk subpathways which can explain the disturbed biological functions in more detail and accurately.
This package provides the
getExpp
,getMethp
,getCnvp
function to calculate p-values or corrected p-values for each gene.
This package provides the coverp2zscore
,combinep_two
,combinep_three
function to Convert p-values or corrected p-values to z-scores.
This package provides the FindSubPath
function to search for interested subpathways in each entire pathway.
This package provides the opt_subpath
function to Optimize interested subpathways.
This package provides the Permutation
function to to calculate statistical significance for these interested subpathways
.
We can use function getExampleData to return example data and environment variables, such as the data of exp_data.
library(ICDS) # obtain the expression profile data exp_data<-GetExampleData("exp_data") #view first six rows and six colmns of data exp_data[1:6, 1:6]
#obtain the labels of the samples of the expression profile, the label vector is a vector of 0/1s, # 0 represents the case sample and 1 represents the control sample label1<-GetExampleData("label1") #view first ten label label1[1:10]
we used the Student's t-test to calculate the p-value for expression level and methylation level of each gene in the tumor and normal samples.label,0 represents normal samples;1 represents cancer samples.
#calculate p-values or corrected p-values for each gene exp.p<-getExpp(exp_data,label = label1,p.adjust = FALSE) label2<-GetExampleData("label2") meth_data<-GetExampleData("meth_data") meth.p<-getMethp(meth_data,label = label2,p.adjust = FALSE) exp.p[1:10,]
For copy number data, we can calculate p-value through the following way:
#obtain Copy number variation data cnv_data<-GetExampleData("cnv_data") #obtion amplified genes amp_gene<-GetExampleData("amp_gene") #obtion deleted genes del_gene<-GetExampleData(("del_gene"))
#calculate p-values or corrected p-values for each gene cnv.p<-getCnvp(exp_data,cnv_data,amp_gene,del_gene,p.adjust=FALSE,method="fdr") cnv.p[1:10,]
With the above data, we constructed an integrative gene z-score (Z) based three datasets.
Firstly,the gene score calculated through combined p-values of Fisher’s Inverse chi-square tests. This method computes a combined statistic S from the p-values of the difference coefficient obtained from three individual datasets .Usually,the statistic S follows a chi-square distribution with 2k degrees,we can calculate the null hypothesis p-value of the statistic S.
Secondly,we convert the p-value to z-score according to Gaussian distribution , which is taken as the gene z-score(Z) .
#obtain the p-values of expression profile data,methylation profile data and Copy number variation data exp.p<-GetExampleData("exp.p") meth.p<-GetExampleData("meth.p") cnv.p<-GetExampleData("cnv.p")
#calculate z-scores for p-values of each kind of data zexp<-coverp2zscore(exp.p) zmeth<-coverp2zscore(meth.p) zcnv<-coverp2zscore(cnv.p) #combine two kinds of p-values,then,calculate z-score for them zz<-combinep_two(exp.p,meth.p) #combine three kinds of p-values,then,calculate z-score for them zzz<-combinep_three(exp.p,meth.p,cnv.p) zzz[1:6,]
For a priori KEGG path, we perform a greedy algorithm to search for interested subpathways.We provide two statistical test methods of the subpathway, which one is whole gene-based perturbation, and the other is the local gene perturbation in a particular pathway.
#obtain z-score of each gene zzz<-GetExampleData("zzz") zzz[1:10,]
require(graphite) zz<-GetExampleData("zzz") #subpathdata<-FindSubPath(zz) #only show
Optimize interested subpathways.If the number of genes shared by the two pathways accounted for more than the Overlap ratio of each pathway genes,than combine two pathways.
subpathdata<-GetExampleData("subpathdata") keysubpathways<-opt_subpath(subpathdata,zz,overlap=0.6) head(keysubpathways)
the perturbation test was used to calculate statistical significance for these interested subpathways.
keysubpathways<-Permutation(keysubpathways,zz,nperm1=100,method1=TRUE,nperm2=100,method2=FALSE) head(keysubpathways)
require(graphite) require(org.Hs.eg.db) subpID<-unlist(strsplit("ACSS1/ALDH3B2/ADH1B/ADH1A/ALDH2/DLAT/ACSS2","/")) pathway.name="Glycolysis / Gluconeogenesis" zzz<- GetExampleData("zzz") PlotSubpathway(subpID=subpID,pathway.name=pathway.name,zz=zzz)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.