adjMatrixTermsAncestors {#adjMatrixTermsAncestors}

Description:
Function to return the adjacency matrix of input GO term numbers in specified ontology.

Usage:
adjMatrixTermsAncestors(GOtermNrInclAncestors, OntologyNr = 1)

Input parameters:

Output parameters:

Author:
CL

See also:
termsAncestors.

Example:

OntologyNr <- 1  
GOtermNrInclAncestors <- termsAncestors(16310, OntologyNr)$Ancestors  
adjMatrixTermsAncestors(GOtermNrInclAncestors, OntologyNr = OntologyNr)  

certainty {#certainty}

Description:
Function to calculate the certainty for a GO term, i.e. the probability that there is no term with a smaller p-value than the p-value of the considered GO term in the given GO subtree.

Usage:
certainty(Pvalues)

Input parameters:

Output parameters:

Author:
CL

See also:

infoValue, importance, remarkableness.

Example:

Pvalues <- runif(10,0,1)
certainty(Pvalues)

checkORAparameters {#checkORAparameters}

Description:
Internal function to check if the parameters passed to dbtORA are correct.

Usage:
checkORAparameters(InFileWithExt, InFileDirectory, RefSetFileWithExt, RefSetDirectory, OutFile, OutFileDirectory, Correction, PvalueThreshold, MinNrOfGenes, OnlyManuCur, drawDAG, MarkDetails, MarkHeadlines, PlotExt)

Input parameters:

Author:
CL

See also:
dbtORA.


dbtORA {#dbtORA}

Description:
Convenient wrapper function to perform an overrepresentation analysis (ORA) including the drawing of the directed acyclic graphs (DAGs) of the resulting GO terms.

Usage:
dbtORA(InFileWithExt, PvalueThreshold = 0.05, Correction = "BON", OnlyManuCur = TRUE, MinNrOfGenes = 2, InFileDirectory = getwd(), OutFile = InFileWithExt, OutFileDirectory = InFileDirectory, RefSetFileWithExt = NULL, RefSetDirectory = InFileDirectory, drawDAG = TRUE, MarkDetails = TRUE, MarkHeadlines = TRUE, PlotExt = "png")

Input parameters:

Details:
Wrapper function to execute mainly ORA and drawORA.

\newline

Coloring of the nodes and its meaning:
Red - Significantly overrepresented nodes;
Green - Significantly underrepresented nodes;
White - Terms that are important for DAG structure but do not have a significant p-value.
Yellow - If MarkHeadlines = TRUE, the significant(!) nodes with highest remarkable value in each path from a detail to the root, the so called headlines, get a yellow filling. The margin indicates over- or underrepesentation by its red or green color.
Blue - If MarkDetails = TRUE, the details of the DAG will be colored in blue. The margin again indicates over- or underrepesentation by its red or green color. If MarkHeadlines and MarkDetails are TRUE, there might be nodes that are both headlines and details. In this case the nodes have a margin according to over- or underrepesentation in red or green and are filled in yellow like all headlines. Additionally the writing is blue to indicate that this node is a detail.

\newline

To read the output files you can simply use any text editor. If you would like to use the results for further calculations it is recommended to install the DataIO package from GitHub. There you can find functions to read and write *.lrn and *.names-files easily in your R console.

Output parameters:
Nine files:

Author:
CL

See also:
ORA, drawORA.

Example:
For this example 30 randomly drawn genes were used. Therefore, the p-value threshold needs to be set to 1 as there won't be any significant terms for the default threshold of 5% - which is exactly what had to be expected for a random set of genes.

dbtORA(InFileWithExt = 'ExampleNCBInamesFile.names', PvalueThreshold = 1, Correction = "BON", OnlyManuCur = TRUE, MinNrOfGenes = 2, InFileDirectory = system.file('extdata', package = 'ORA'), OutFile = 'ExampleNCBInamesFile', OutFileDirectory = getwd(), RefSetFileWithExt = NULL, RefSetDirectory = getwd(), drawDAG = TRUE, MarkDetails = TRUE, MarkHeadlines = TRUE, PlotExt = "png")

This yields to the following output and a warning, as OnlyManuCur = TRUE and two genes are not manually curated (but only automatically), i.e. they are ignored for analysis:

[1] "........................................"
[1] "ORA: summary"
[1] "Number of genes in test set: 28"
[1] "Number of genes in universe/reference set: 17656"
[1] "Number of p-values: 453"
[1] "Number of adjusted p-values: 453"
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.988894 to fit
[1] "plotGOgraph: png-File named \"ExampleNCBInamesFileGenes28_BON_1_2_MANU_BP.png\" saved in [OutFileDirectory]"
[1] "plotGOgraph: png-File named \"ExampleNCBInamesFileGenes28_BON_1_2_MANU_MF.png\" saved in [OutFileDirectory]"
[1] "plotGOgraph: png-File named \"ExampleNCBInamesFileGenes28_BON_1_2_MANU_CC.png\" saved in [OutFileDirectory]"
Warning message:
In dbtORA(InFileWithExt = "ExampleNCBInamesFile.names", PvalueThreshold = 1,:  
dbtORA: 2 input gene(s) were not used. There might be duplicates in input genes or some input genes are not annotated to any GO term. 
For analysis used genes can be found in ExampleNCBInamesFileGenes28.names in [OutFileDirectory].

drawORA {#drawORA}

Description:
To draw the gene ontology DAG containing ORA results information. Prepares data for plotGOgraph which does the actual plotting.

Usage:
drawORA(ORAresults, PlotFileWithExt, PlotDirectory, MarkDetails = TRUE, MarkHeadlines = TRUE, Overwrite = TRUE)

Input parameters:

Details:
Function requires the freely available visualization software GraphViz (website).

\newline

Coloring of the nodes and its meaning:
Red - Significantly overrepresented nodes;
Green - Significantly underrepresented nodes;
White - Terms that are important for DAG structure but do not have a significant p-value.
Yellow - If MarkHeadlines = TRUE, the significant(!) nodes with highest remarkable value in each path from a detail to the root, the so called headlines, get a yellow filling. The margin indicates over- or underrepesentation by its red or green color.
Blue - If MarkDetails = TRUE, the details of the DAG will be colored in blue. The margin again indicates over- or underrepesentation by its red or green color. If MarkHeadlines and MarkDetails are TRUE, there might be nodes that are both headlines and details. In this case the nodes have a margin according to over- or underrepesentation in red or green and are filled in yellow like all headlines. Additionally the writing is blue to indicate that this node is a detail.

Author:
CL

See also:
For further details about ORAresults, please see ORA.

Example:
For this example, the random set of 30 genes from ExampleNCBInamesFile.names in system.file('extdata', package = 'ORA') is used. As the genes are randomly drawn the p-value threshold had to be set to 1.

For further documentation for function ReadNAMES, package DataIO from GitHub is recommended.

# NCBIs <- ReadNAMES('ExampleNCBInamesFile.names', system.file('extdata', package = 'ORA'))$Key
NCBIs <- c(8402, 6199, 72, 387254, 10083, 170370, 25921, 9324, 7305, 6675, 2, 3224, 90342, 20, 121340, 89792, 83998, 140469, 5005, 7398, 26575, 53826, 5024, 50618, 3061, 51176, 7903, 90529, 28316, 6406)
ORAresults <- ORA(NCBIs, Correction = "BON", PvalueThreshold = 1, MinNrOfGenes = 2, OnlyManuCur = FALSE, RefSet = NULL, GOAall = ReadLRN("GOAall.lrn", system.file("extdata",package = "ORA")))
PlotFileWithExt <- 'Example4Vignette.png'
PlotDirectory <- getwd()
drawORA(ORAresults, PlotFileWithExt, PlotDirectory, MarkDetails = TRUE, MarkHeadlines = TRUE, Overwrite = TRUE)

GOroot2TermPaths {#GOroot2TermPaths}

Description:
Function to get all paths from the gene ontology root of the DAG to one target term.

Usage:
GOroot2TermPaths(TargetTerm, AdjMatrix, GOtermNr, GOroot = 8150)

Input parameters:

Output parameters:

Author:
CL

See also:
termsAncestors, adjMatrixTermsAncestors.

Example:

OntologyNr <- 1 # for biological process
TargetTerm <- 6796
Ancestors <- termsAncestors(GOtermNr = 6796, OntologyNr)$Ancestors
AdjMatrixAndTermNrs <- adjMatrixTermsAncestors(GOtermNrInclAncestors = c(TargetTerm, Ancestors), OntologyNr)
GOroot2TermPaths(TargetTerm = TargetTerm, AdjMatrix = AdjMatrixAndTermNrs$AdjMatrix, GOtermNr = AdjMatrixAndTermNrs$GOtermNrs, GOroot = 8150)

hypergeoTest {#hypergeoTest}

Description:
Function to do a one-sided hypergeometric test, i.e. calculate the probability to draw more or less (expectation value smaller than observed number of successes respectively expectation value greater than observed number of successes) than a certain number of successes (ObservedNrOfAnnsInTerm) in a fixed number of draws (NrOfGenesInSample), without replacement, from a finite population of fixed size (NrOfGenesInUniverse) that contains a known number of successes (NrOfAnnotationsInTerm), wherein each draw is either a success or a failure.

Usage:
hypergeoTest(ObservedNrOfAnnsInTerm, NrOfAnnotationsInTerm, NrOfGenesInSample, NrOfGenesInUniverse, LogPvalues = TRUE)

Input parameters:

Details:
Wrapper for phyper. Hypergeometric test is done one-sided depending on ExpectedNrOfAnnsInTerm: If the expected number of genes annotated to one GO term is less than ObservedNrOfAnnsInTerm, the log-p-value will be log(P(X>=ObservedNrOfAnnsInTerm)) where X is the hypergeometric distributed random variable. If the expected number of genes annotated to one GO term is greater than ObservedNrOfAnnsInTerm, the log-p-value will be log(P(X<ObservedNrOfAnnsInTerm)) where X is the hypergeometric distributed random variable.

Output parameters:

Author:
CL

See also:
phyper.

Example:

ObservedNrOfAnnsInTerm <- 17
NrOfAnnotationsInTerm <- 30
NrOfGenesInSample <- 500
NrOfGenesInUniverse <- 17656
hypergeoTest(ObservedNrOfAnnsInTerm, NrOfAnnotationsInTerm, NrOfGenesInSample, NrOfGenesInUniverse, LogPvalues = TRUE)
hypergeoTest(ObservedNrOfAnnsInTerm, NrOfAnnotationsInTerm, NrOfGenesInSample, NrOfGenesInUniverse, LogPvalues = FALSE)

importance {#importance}

Description:
Function to calculate the importance for given information values and certainty values for GO terms, i.e. the minimum of both.

Usage:
importance(Certainty, InfoValue)

Input parameters:

Output parameters:

Author:
CL

See also:
certainty, infoValue, remarkableness.

Example:

Certainty <- c(30, 60, 70, 90)
InfoValue <- c(80, 70, 20, 70)
importance(Certainty, InfoValue)

infoValue {#infoValue}

Description:
Function calculates the partial Shannon information of gene sets in GO terms explaining how informative a certain term in the context of all terms is.

Usage:
infoValue(NrOfAnnotationsInTerm, NrOfGenesInUniverse = max(NrOfAnnotationsInTerm))

Input parameters:

Output parameters:

Author:
CL

See also:
certainty, importance, remarkableness.

Example:

NrOfAnnotationsInTerm <- 30
NrOfGenesInUniverse <- 17656
infoValue(NrOfAnnotationsInTerm, NrOfGenesInUniverse)

NCBI2GeneName {#NCBI2GeneName}

Description:
Function to get GeneSymbol and GeneName for given NCBI numbers from 'AllAnnNCBIsPlusGeneName.names' in system.file('extdata',package='ORA').

Usage:
NCBI2GeneName(NCBI)

Input parameters:

Output parameters:

Author:
CL

Example:

NCBI2GeneName(NCBI = c(1,12, 1857))

ontologyNr {#ontologyNr}

Description:
Returns for given GOterm (as ID or number) the corresponding gene ontology as number, where 1 codes for biological process, 2 for molecular function and 4 for cellular component. If the result is 0, something went wrong.

Usage:
ontologyNr(GOtermNrOrId, Verbose = FALSE)

Input parameters:

Output parameters:

Author:
CL

Example:

ontologyNr(c(8150, 15774, 5776, 4582))

ORA {#oRA}

Description:
Main function to calculate the overrepresentation analysis based on gene ontology using a one-sided hypergeometric test for given genes. For more convenient wrapper function see dbtORA.

Usage:
ORA(NCBIs, Correction = "BON", PvalueThreshold = 0.05, MinNrOfGenes = 2, OnlyManuCur = FALSE, RefSet = NULL, GOAall = ReadLRN("GOAall.lrn", system.file("extdata",package = "ORA")))

Input parameters:

Details:
To read the output files you can simply use any text editor. If you would like to use the results for further calculations it is recommended to install the DataIO package from GitHub. There you can find functions to read and write *.lrn and *.names-files easily in your R console.

Output parameters:

Author:
CL

See also:
dbtORA.

Example:
For this example, again the random set of genes from ExampleNCBInamesFile.names in system.file('extdata', package = 'ORA') is used. P-value threshold, again, is set to 1. For further documentation for function ReadNAMES, package DataIO from GitHub is recommended.

# NCBIs <- ReadNAMES('ExampleNCBInamesFile.names', system.file('extdata', package = 'ORA'))$Key
NCBIs <- c(8402, 6199, 72, 387254, 10083, 170370, 25921, 9324, 7305, 6675, 2, 3224, 90342, 20, 121340, 89792, 83998, 140469, 5005, 7398, 26575, 53826, 5024, 50618, 3061, 51176, 7903, 90529, 28316, 6406)
ORA(NCBIs, Correction = "BON", PvalueThreshold = 1, MinNrOfGenes = 2, OnlyManuCur = FALSE, RefSet = NULL, GOAall = ReadLRN("GOAall.lrn", system.file("extdata",package = "ORA")))

NOTE: There is no warning this time, as OnlyManuCur = FALSE.


ORAfilename {#oRAfilename}

Description:
Function to complete the OutFile name passed to dbtORA with ORA parameters.

Usage:
ORAfilename(OutFile, NrOfValidInputGenes, Correction, PvalueThreshold, MinNrOfGenes, OnlyManuCur, WithRefSet = FALSE)

Input parameters:

Output parameters:

Author:
CL

See also:
dbtORA.


plotGOgraph {#plotGOgraph}

Description:
Function draws and colors the gene ontology DAG of input GO terms depending on input parameters and saves it as PlotFile.

Usage:
plotGOgraph(Adj,GOtermIDs,PlotFile,PlotDirectory=getwd(), Significant=rep(1,length(GOtermIDs)),IsHeadline=rep(0,length(GOtermIDs)), MarkDetails=TRUE, Overwrite=TRUE, GOtermString=NULL,Remarkable=NULL,Pvalues=NULL, NrGenesInTerm=NULL,Expected = NULL, Observed = NULL, Importance = NULL, Up=NULL)

Input parameters:

Output parameters:
PlotFile in PlotDirectory with DAG of input GO terms.

Author:
CL

See also:
drawDAG.

Example:
Artificial example just to show the functionality of plotGOgraph.

OntologyNr <- 1 # for biological process
Ancestors <- termsAncestors(GOtermNr = 6796, OntologyNr)$Ancestors
AdjMatrixAndTermNrs <- adjMatrixTermsAncestors(GOtermNrInclAncestors = c(6796, Ancestors), OntologyNr)
GOtermID <- termId(c(6796, Ancestors))
PlotFile <- 'Example4Vignette.png'
Significant <- c(1, 1, 0, 1, 1, 1)
IsHeadline <- c(0, 0, 0, 1, 1, 0)
Up <- c(1, 1, 1, 1, 0, 1)
plotGOgraph(Adj = AdjMatrixAndTermNrs$AdjMatrix, GOtermID = GOtermID, PlotFile = PlotFile, PlotDirectory=getwd(), Significant=Significant, IsHeadline=IsHeadline, MarkDetails=TRUE, Overwrite=TRUE, GOtermString=NULL, Remarkable=NULL, Pvalues=NULL, NrGenesInTerm=NULL, Expected = NULL, Observed = NULL, Importance = NULL, Up=Up)

remarkableness {#remarkableness}

Description:
Function to calculate the remarkableness of a GO term.

Usage:
remarkableness(Certainty, InfoValue)

Input parameters:

Output parameters:

Author:
CL

See also:
certainty, infoValue, importance.

Example:

Certainty <- c(30, 60, 70, 90)
InfoValue <- c(80, 70, 20, 70)
remarkableness(Certainty, InfoValue)

termDescription {#termDescription}

Description:
Yields description of inputed GO-term ID.

Usage:
termDescription(GOtermId)

Input parameters:

Details:
Requires package GO.db.

Output parameters:

Authors:
CL, MT, AU

See also:
termId'](# termId), [termNr`.

Example:

termDescription('GO:0008150')

termId {#termId}

Description:
Casts GO term numbers to GO term IDs.

Usage:
termId(GOtermNr)

Input parameters:

Output parameters:

Authors:
CL, MT, AU

See also:
termDescription'](# termDescription), [termNr`.

Example:

termId(8150)

termNr {#termNr}

Description:
Casts GO term IDs to GO term numbers.

Usage:
termNr(GOtermId)

Input parameters:

Output parameters:

Authors:
CL, MT, AU

See also:
termDescription'](# termDescription), [termId`.

Example:

termNr('GO:0008150')

termpathsHeadlines {#termpathsHeadlines}

Description:
Calculates headlines for each path from root to GO detail terms. Headlines represent most important nodes in these paths, i.e. the GO terms with the highest Importance on the path.

Usage:
termpathsHeadlines(AdjMatrix, GOtermNr, Importance, OntologyNr = 1)

Input parameters:

Details:
PLEASE NOTE: All given GO terms have to be in the specified ontology and all ancestors from GO terms to ontology root (including the root term itself) must be included in the given vector of GO terms.

Output parameters:

Author:
CL

See also:
GOroot2TermPaths, remarkableness, importance.

Example:

OntologyNr <- 1 # for biological process
Ancestors <- termsAncestors(GOtermNr = 6796, OntologyNr)$Ancestors
AdjMatrixAndTermNrs <- adjMatrixTermsAncestors(GOtermNrInclAncestors = c(6796, Ancestors), OntologyNr)
Importance <- c(29, 49, 75, 98, 37, 92)
termpathsHeadlines(AdjMatrix = AdjMatrixAndTermNrs$AdjMatrix, GOtermNr = AdjMatrixAndTermNrs$GOtermNrs, Importance = Importance, OntologyNr = OntologyNr)

termsAncestors {#termsAncestors}

termsAncestors

Description:
Function returns vector of all ancestors in ontology for given GO term numbers.

Usage:
termsAncestors(GOtermNr, OntologyNr)

Input parameters:

Output parameters:

Author:
CL

See also: adjMatrixTermsAncestors.

Example:

OntologyNr <- 1 # for biological process
termsAncestors(GOtermNr = 6796, OntologyNr)

updateORAdatabase {#updateORAdatabase}

Description:
With this function user can update database used for ORA. Determining the point in time this should be done himself. (E.g., to keep the database untill project is done.) The new files will be temporary saved in system.file('extdata', package = 'ORA'). Updating the data is only necessary if user updates R to newer version and packages GO.db and org.Hs.eg.db are updated. Timestamp of data provided in the package ORA: 31 January 2018. Function has then to be called every time after restart of R session.

Usage:
updateORAdatabase()

Author:
CL

See also:
dbtORA.


WriteORAresults {#WriteORAresults}

Description:
Function to write *.lrn file of GO terms and the computed values, *.names file of GO terms and their descriptions, matrix of annotations of genes and GO terms, matrix of structure of DAG of GO terms and a corresponding explanatory *.names file.

Usage:
WriteORAresults(FileNameWithoutExt, ORAresults, OutDirectory = getwd(), InFileWithExt = "")

Input parameters:

For further detail on the input see ORA

Output parameters:
Files saved in OutDirectory containing all the information received by ORA, Genes to GO terms matrix and adjacency matrix of the GO terms. For further detail on the output see dbtORA.

Author:
CL

See also: dbtORA, ORA.



CLippmann/ORA documentation built on Feb. 4, 2020, 9:38 p.m.