Description:
Function to return the adjacency matrix of input GO term numbers in specified ontology.
Usage:
adjMatrixTermsAncestors(GOtermNrInclAncestors, OntologyNr = 1)
Input parameters:
GOtermNrInclAncestors
:
Numeric;
Vector of GO term numbers of GO terms and their
ancestors up to the root in gene ontology specified
by OntologyNr
.
OntologyNr
:
Numeric; Default: 1
To select the ontology choose one of:
1
for biological process,
2
for molecular function or
4
for cellular component.
Output parameters:
AdjMatrix
:AdjMatrix[i,j] == 1
iff GO term i
is parent of
GO term j
. Named by GOtermIds.GOtermNrs
:Author:
CL
See also:
termsAncestors
.
Example:
OntologyNr <- 1 GOtermNrInclAncestors <- termsAncestors(16310, OntologyNr)$Ancestors adjMatrixTermsAncestors(GOtermNrInclAncestors, OntologyNr = OntologyNr)
Description:
Function to calculate the certainty for a GO term, i.e.
the probability that there is no term with a smaller p-value than the p-value of the considered GO term in the given
GO subtree.
Usage:
certainty(Pvalues)
Input parameters:
Output parameters:
Author:
CL
See also:
infoValue
,
importance
,
remarkableness
.
Example:
Pvalues <- runif(10,0,1) certainty(Pvalues)
Description:
Internal function to check if the parameters passed to dbtORA
are correct.
Usage:
checkORAparameters(InFileWithExt, InFileDirectory, RefSetFileWithExt,
RefSetDirectory, OutFile, OutFileDirectory, Correction, PvalueThreshold,
MinNrOfGenes, OnlyManuCur, drawDAG, MarkDetails, MarkHeadlines, PlotExt)
Input parameters:
InFileWithExt
:*.names
and *.lrn
files)
or the only column (for *.txt
files).InFileDirectory
:InFileWithExt
can be found.RefSetFileWithExt
: *.names
and *.lrn
files)
or the only column (for *.txt
files). NCBIs will be used as reference set.RefSetDirectory
:
String;
Directory where RefSetFileWithExt
with reference NCBIs can be found.
OutFile
:
String;
Filename of the output file(s). Will be complemented by the parameters of the ORA.
OutFileDirectory
:
String;
Directory where results of ORA and DAGs will be saved.
Correction
:
String;
Type of correction for mulitple testing of the p-values.
'BON'
for Bonferroni,
'FDR'
for False Discovery Rate,
'RAW'
if no correction should be done.
PvalueThreshold
:
Numeric;
P-value threshold. GO-Terms with p-values greater than PvalueThreshold
will be ignored.
MinNrOfGenes
:
Numeric;
Minimum number of genes annotated to one Term that is accepted. Only GO-Terms with more than
MinNrOfGenes
annotated genes will be considered in calculation.
OnlyManuCur
:
Boolean;
Set TRUE
if only manually curated gene annotations should be considered.
drawDAG
:
Boolean;
Set TRUE
if directed acyclic graphs (DAGs) should be drawn.
If drawDAG
is set to FALSE
, the parameters MarkDetails
, MarkHeadlines
and PlotExt
will be
ignored.
MarkDetails
:
Boolean;
Set TRUE
if details of the DAG should be marked in blue colour.
MarkHeadlines
:
Boolean;
Set TRUE
if headlines of the DAG should be marked in yellow colour.
PlotExt
:
String;
Extension of the plotfile showing the DAG. One of 'pdf'
, 'eps'
or 'png'
.
Author:
CL
See also:
dbtORA
.
Description:
Convenient wrapper function to perform an overrepresentation analysis (ORA) including the drawing of the directed acyclic graphs (DAGs)
of the resulting GO terms.
Usage:
dbtORA(InFileWithExt, PvalueThreshold = 0.05, Correction = "BON", OnlyManuCur = TRUE,
MinNrOfGenes = 2, InFileDirectory = getwd(), OutFile = InFileWithExt,
OutFileDirectory = InFileDirectory, RefSetFileWithExt = NULL,
RefSetDirectory = InFileDirectory, drawDAG = TRUE, MarkDetails = TRUE,
MarkHeadlines = TRUE, PlotExt = "png")
Input parameters:
InFileWithExt
:PvalueThreshold
:0.05
PvalueThreshold
will be ignored.Correction
:'BON'
.'BON'
for Bonferroni,
'FDR'
for False Discovery Rate,
'RAW'
if no correction should be done.OnlyManuCur
:TRUE
.TRUE
if only manually curated gene annotations should be considered.MinNrOfGenes
:
Numeric; Default: 2
.
Minimum number of genes annotated to one term that is accepted. Only GO terms with more than
MinNrOfGenes
annotated genes will be considered in calculation.
InFileDirectory
:
String; Default: current directory getwd()
.
Directory where InFileWithExt
can be found. If InFileWithExt
not given, function will ask interactively.
OutFile
:InFileWithExt
(extension will be adjusted).OutFileDirectory
:InFileDirectory
.RefSetFileWithExt
:NULL
.*.names
and *.lrn
files)
or the only column (for *.txt
files). NCBIs will be used as reference set.RefSetDirectory
:InFileDirectory
.RefSetFileWithExt
with reference NCBIs can be found.drawDAG
:TRUE
.TRUE
if DAGs should be drawn.
If drawDAG
is set to FALSE
, the parameters MarkDetails
, MarkHeadlines
and PlotExt
will be
ignored.MarkDetails
:TRUE
.TRUE
if details of the DAG should be marked in blue colour.MarkHeadlines
:TRUE
.TRUE
if headlines of the DAG should be marked in yellow colour.PlotExt
:'png'
. 'pdf'
, 'eps'
or 'png'
.Details:
Wrapper function to execute mainly ORA
and drawORA
.
\newline
Coloring of the nodes and its meaning:
Red - Significantly overrepresented nodes;
Green - Significantly underrepresented nodes;
White - Terms that are important for DAG structure but do not have a
significant p-value.
Yellow - If MarkHeadlines = TRUE
, the significant(!) nodes with highest remarkable value
in each path from a detail to the root, the so called headlines, get a yellow filling. The
margin indicates over- or underrepesentation by its red or green color.
Blue - If MarkDetails = TRUE
, the details of the DAG will be colored in blue.
The margin again indicates over- or underrepesentation by its red or green color.
If MarkHeadlines
and MarkDetails
are TRUE
, there might be nodes that are
both headlines and details. In this case the nodes have a margin according to
over- or underrepesentation in red or green and are filled in yellow like all
headlines. Additionally the writing is blue to indicate that this node is a detail.
\newline
To read the output files you can simply use any text editor. If you would like to use the results for further calculations
it is recommended to install the DataIO
package from GitHub. There you can find
functions to read and write *.lrn
and *.names
-files easily in your R console.
Output parameters:
Nine files:
[InFileWithoutExt]
Genes
[XXX]
.names
:XXX
valid genes, filename extended with Genes
and number of valid genes, i.e. the genes that have at least one
(manually curated (if OnlyManuCur = TRUE
)) annotation to a term in GO.[OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur]
_BP.[PlotExt]
:[OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur]
_MF.[PlotExt]
: [OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur]
_CC.[PlotExt]
:[OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur][YYY]
Terms.lrn
:YYY
significant terms and all numeric results for each term like p-value, remarkableness, number of annotations, isHeadline and so on. For detailed explanation see documentation of ORA
.[OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur][YYY]
Terms.names
:YYY
significant terms.[OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur]
Genes2GOterms
[XXX]
x
[YYY]
.lrn
:YYY
significant output terms. If an entry [i,j] == 1
, the gene in row i
is annotated to the term in column j
.[OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur]
GOterms2GOterms
[ZZZ]
x
[ZZZ]
.lrn
:YYY
significant terms and ZZZ-YYY
additional terms that are needed to draw the three DAGs. This Matrix represents the DAGs' structure in the form that an
entry [i,j]==1
if the term in row i
is parent of the term in column j
. In the first row the DAG the terms in the columns belong to is specified: 1
for biological process (BP), 2
for molecular function (MF) and
4
for cellular component.[OutFile]
_[Correction]
_[PvalueThreshold]
_[MinNrOfGenes]
_[OnlyManuCur]
GOterms
[ZZZ]
.names
:ZZZ
terms. Author:
CL
Example:
For this example 30 randomly drawn genes were used. Therefore, the p-value threshold needs to be set to 1
as there won't be
any significant terms for the default threshold of 5% - which is exactly what had to be expected for a random set of genes.
dbtORA(InFileWithExt = 'ExampleNCBInamesFile.names', PvalueThreshold = 1, Correction = "BON", OnlyManuCur = TRUE, MinNrOfGenes = 2, InFileDirectory = system.file('extdata', package = 'ORA'), OutFile = 'ExampleNCBInamesFile', OutFileDirectory = getwd(), RefSetFileWithExt = NULL, RefSetDirectory = getwd(), drawDAG = TRUE, MarkDetails = TRUE, MarkHeadlines = TRUE, PlotExt = "png")
This yields to the following output and a warning, as OnlyManuCur = TRUE
and two genes are not manually curated (but only automatically), i.e. they are ignored for analysis:
[1] "........................................" [1] "ORA: summary" [1] "Number of genes in test set: 28" [1] "Number of genes in universe/reference set: 17656" [1] "Number of p-values: 453" [1] "Number of adjusted p-values: 453" dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.988894 to fit [1] "plotGOgraph: png-File named \"ExampleNCBInamesFileGenes28_BON_1_2_MANU_BP.png\" saved in [OutFileDirectory]" [1] "plotGOgraph: png-File named \"ExampleNCBInamesFileGenes28_BON_1_2_MANU_MF.png\" saved in [OutFileDirectory]" [1] "plotGOgraph: png-File named \"ExampleNCBInamesFileGenes28_BON_1_2_MANU_CC.png\" saved in [OutFileDirectory]" Warning message: In dbtORA(InFileWithExt = "ExampleNCBInamesFile.names", PvalueThreshold = 1,: dbtORA: 2 input gene(s) were not used. There might be duplicates in input genes or some input genes are not annotated to any GO term. For analysis used genes can be found in ExampleNCBInamesFileGenes28.names in [OutFileDirectory].
Description:
To draw the gene ontology DAG containing ORA results information.
Prepares data for plotGOgraph
which does the actual plotting.
Usage:
drawORA(ORAresults, PlotFileWithExt, PlotDirectory, MarkDetails = TRUE, MarkHeadlines = TRUE, Overwrite = TRUE)
Input parameters:
ORAresults
:LRNresults
: List of 16:NAMESresults
: List of 3:Genes2GOtermsSparseMatrix
:Genes2GOtermsSparseMatrix[i,3]==1
iff gene in [i,1]
is annotated to
GO term in [i,2]
. GO2GOAdjMatrices
: List of 4:PlotFileWithExt
:'png'
.'png'
, 'eps'
or 'pdf'
.PlotDirectory
:PlotFileWithExt
should be saved.MarkDetails
:TRUE
.TRUE
if details of DAG should be marked in blue colour.MarkHeadlines
:TRUE
.TRUE
if headlines should be marked in yellow colour in DAG.Overwrite
:
Boolean; Default: TRUE
.TRUE
if existing files with the same name should be overwritten.Details:
Function requires the freely available visualization software GraphViz (website).
\newline
Coloring of the nodes and its meaning:
Red - Significantly overrepresented nodes;
Green - Significantly underrepresented nodes;
White - Terms that are important for DAG structure but do not have a
significant p-value.
Yellow - If MarkHeadlines = TRUE
, the significant(!) nodes with highest remarkable value
in each path from a detail to the root, the so called headlines, get a yellow filling. The
margin indicates over- or underrepesentation by its red or green color.
Blue - If MarkDetails = TRUE
, the details of the DAG will be colored in blue.
The margin again indicates over- or underrepesentation by its red or green color.
If MarkHeadlines
and MarkDetails
are TRUE
, there might be nodes that are
both headlines and details. In this case the nodes have a margin according to
over- or underrepesentation in red or green and are filled in yellow like all
headlines. Additionally the writing is blue to indicate that this node is a detail.
Author:
CL
See also:
For further details about ORAresults, please see ORA
.
Example:
For this example, the random set of 30 genes from ExampleNCBInamesFile.names
in system.file('extdata', package = 'ORA')
is used. As the genes are randomly drawn the p-value threshold had to be set to 1
.
For further documentation for function ReadNAMES
, package DataIO
from GitHub is recommended.
# NCBIs <- ReadNAMES('ExampleNCBInamesFile.names', system.file('extdata', package = 'ORA'))$Key NCBIs <- c(8402, 6199, 72, 387254, 10083, 170370, 25921, 9324, 7305, 6675, 2, 3224, 90342, 20, 121340, 89792, 83998, 140469, 5005, 7398, 26575, 53826, 5024, 50618, 3061, 51176, 7903, 90529, 28316, 6406) ORAresults <- ORA(NCBIs, Correction = "BON", PvalueThreshold = 1, MinNrOfGenes = 2, OnlyManuCur = FALSE, RefSet = NULL, GOAall = ReadLRN("GOAall.lrn", system.file("extdata",package = "ORA"))) PlotFileWithExt <- 'Example4Vignette.png' PlotDirectory <- getwd() drawORA(ORAresults, PlotFileWithExt, PlotDirectory, MarkDetails = TRUE, MarkHeadlines = TRUE, Overwrite = TRUE)
Description:
Function to get all paths from the gene ontology root of the DAG to one target term.
Usage:
GOroot2TermPaths(TargetTerm, AdjMatrix, GOtermNr, GOroot = 8150)
Input parameters:
TargetTerm
:AdjMatrix
:AdjMatrix[i,j] == 1
iff there exists an edge in GO-DAG from node i
to node j
(i
is parent of j
).GOtermNr
:GOroot
:8150
.GOtermNr
s are. One of:8150
for biological process,
3674
for molecular function or
5575
for cellular component.Output parameters:
Taxonomy
:GOtermNr[Taxonomy[[i]]]
is the i
-th path from GOroot
to TargetTerm
.Author:
CL
See also:
termsAncestors
,
adjMatrixTermsAncestors
.
Example:
OntologyNr <- 1 # for biological process TargetTerm <- 6796 Ancestors <- termsAncestors(GOtermNr = 6796, OntologyNr)$Ancestors AdjMatrixAndTermNrs <- adjMatrixTermsAncestors(GOtermNrInclAncestors = c(TargetTerm, Ancestors), OntologyNr) GOroot2TermPaths(TargetTerm = TargetTerm, AdjMatrix = AdjMatrixAndTermNrs$AdjMatrix, GOtermNr = AdjMatrixAndTermNrs$GOtermNrs, GOroot = 8150)
Description:
Function to do a one-sided hypergeometric test, i.e. calculate the probability to draw more or less
(expectation value smaller than observed number of successes respectively expectation value greater
than observed number of successes) than a certain number of successes (ObservedNrOfAnnsInTerm
) in a
fixed number of draws (NrOfGenesInSample
), without replacement, from a finite population of fixed
size (NrOfGenesInUniverse
) that contains a known number of successes (NrOfAnnotationsInTerm
), wherein
each draw is either a success or a failure.
Usage:
hypergeoTest(ObservedNrOfAnnsInTerm, NrOfAnnotationsInTerm, NrOfGenesInSample, NrOfGenesInUniverse, LogPvalues = TRUE)
Input parameters:
ObservedNrOfAnnsInTerm
:NrOfAnnotationsInTerm
:NrOfGenesInSample
:NrOfGenesInUniverse
:LogPvalues
:TRUE
.TRUE
if log(p-values)
should be calculated. Set FALSE
if non-transformed p-values should be returned.Details:
Wrapper for phyper
.
Hypergeometric test is done one-sided depending on ExpectedNrOfAnnsInTerm
:
If the expected number of genes annotated to one GO term is less than
ObservedNrOfAnnsInTerm
, the log-p-value will be log(P(X>=ObservedNrOfAnnsInTerm))
where X
is the hypergeometric distributed random variable.
If the expected number of genes annotated to one GO term is greater than
ObservedNrOfAnnsInTerm
, the log-p-value will be log(P(X<ObservedNrOfAnnsInTerm))
where X
is the hypergeometric distributed random variable.
Output parameters:
LogPvalues = TRUE
; else vector of p-values) of one-sided hypergeometric test.Author:
CL
See also:
phyper
.
Example:
ObservedNrOfAnnsInTerm <- 17 NrOfAnnotationsInTerm <- 30 NrOfGenesInSample <- 500 NrOfGenesInUniverse <- 17656 hypergeoTest(ObservedNrOfAnnsInTerm, NrOfAnnotationsInTerm, NrOfGenesInSample, NrOfGenesInUniverse, LogPvalues = TRUE) hypergeoTest(ObservedNrOfAnnsInTerm, NrOfAnnotationsInTerm, NrOfGenesInSample, NrOfGenesInUniverse, LogPvalues = FALSE)
Description:
Function to calculate the importance for given information values and certainty values for GO terms, i.e. the minimum of both.
Usage:
importance(Certainty, InfoValue)
Input parameters:
Certainty
:0
and 100
.certainty
.InfoValue
:0
and 100
.infoValue
.Output parameters:
Importance
:0
and 100
.
Vector of importance value based on InfoValue
and Certainty
.Author:
CL
See also:
certainty
,
infoValue
,
remarkableness
.
Example:
Certainty <- c(30, 60, 70, 90) InfoValue <- c(80, 70, 20, 70) importance(Certainty, InfoValue)
Description:
Function calculates the partial Shannon information of gene sets in GO terms
explaining how informative a certain term in the context of all terms is.
Usage:
infoValue(NrOfAnnotationsInTerm, NrOfGenesInUniverse = max(NrOfAnnotationsInTerm))
Input parameters:
NrOfAnnotationsInTerm
:NrOfGenesInUniverse
:max(NrOfAnnotationsInTerm)
.NrOfGenesInUniverse
is the same as the number of genes (directly + indirectly) annotated to the root.)Output parameters:
InfoValue
:InfoValueP
:NrOfAnnotationsInTerm/NrOfGenesInUniverse
for each term which is
the empirical probability of occurrence.Author:
CL
See also:
certainty
,
importance
,
remarkableness
.
Example:
NrOfAnnotationsInTerm <- 30 NrOfGenesInUniverse <- 17656 infoValue(NrOfAnnotationsInTerm, NrOfGenesInUniverse)
Description:
Function to get GeneSymbol and GeneName for given NCBI numbers from 'AllAnnNCBIsPlusGeneName.names'
in system.file('extdata',package='ORA')
.
Usage:
NCBI2GeneName(NCBI)
Input parameters:
NCBI
:1
).Output parameters:
GeneSymbol
:A1BG
).GeneName
:alpha-1-B glycoprotein
).Author:
CL
Example:
NCBI2GeneName(NCBI = c(1,12, 1857))
Description:
Returns for given GOterm (as ID or number) the corresponding gene ontology as number, where
1
codes for biological process,
2
for molecular function and
4
for cellular component.
If the result is 0
, something went wrong.
Usage:
ontologyNr(GOtermNrOrId, Verbose = FALSE)
Input parameters:
GOtermNrOrId
:'GO:0008150'
) OR Numeric vector of GOterm numbers (like 8150
).Verbose
:TRUE
, function prints information in GUI window. Output parameters:
OntoNumber
:1
codes for biological process,
2
for molecular function and 4
for cellular component. Author:
CL
Example:
ontologyNr(c(8150, 15774, 5776, 4582))
Description:
Main function to calculate the overrepresentation analysis based on gene ontology
using a one-sided hypergeometric test for given genes. For more convenient wrapper
function see dbtORA
.
Usage:
ORA(NCBIs, Correction = "BON", PvalueThreshold = 0.05, MinNrOfGenes = 2,
OnlyManuCur = FALSE, RefSet = NULL, GOAall = ReadLRN("GOAall.lrn",
system.file("extdata",package = "ORA")))
Input parameters:
NCBIs
:Correction
:'BON'
.'BON'
for Bonferroni, 'FDR'
for False Discovery Rate, 'RAW'
if no correction should be done.PvalueThreshold
:0.05
.PvalueThreshold
will be ignored.MinNrOfGenes
:2
.MinNrOfGenes
genes will be considered in analysis.OnlyManuCur
:FALSE
.TRUE
if only manually curated gene annotations should be considered.RefSet
:NULL
.NULL
or missing, all known genes are taken as universe.GOAall
:system.file("extdata", package = 'ORA')
, called GOAall.lrn
.Details:
To read the output files you can simply use any text editor. If you would like to use the results for further calculations
it is recommended to install the DataIO
package from GitHub. There you can find
functions to read and write *.lrn
and *.names
-files easily in your R console.
Output parameters:
ORAresults
:LRNresults
: List of 16:
Information needed to generate the lrn file containing all the calculated values for
the GO terms found to be significant for input genes.LRNresults$GOtermNr
: Numeric; GO term numbers found to be significant for input genes. LRNresults$OntologyNr
: Numeric; Number of ontology (1
= biological process (BP), 2
= molecular function (MF), 4
= cellular component (CC)). LRNresults$NrOfGenesInUniverse
: Numeric; Number of genes in universe used for p-value computation. LRNresults$NrOfGenesInSample
: Numeric; Number of input genes used for p-value computation. LRNresults$NrOfAnnotationsInTerm
: Numeric; Number of annotations associated to GO term. LRNresults$Up
: Numeric; 1
if GO term is up regulated (ExpNrOfAnnsInTerm < ObservedNrOfAnnsInTerm
), 0
if down. LRNresults$ExpNrOfAnnsInTerm
: Numeric; Statistically expected number of genes annotated to GO term. LRNresults$ObservedNrOfAnnsInTerm
: Numeric; Empirically observed number of genes annotated to GO term.LRNresults$RelDiff
: Numeric; Relative difference of expected and observed in percent.LRNresults$Pvalue
: Numeric; P-values for each GO term received by hypergeometric test.LRNresults$LogPvalue
: Numeric; log(Pvalue)
. LRNresults$Certainty
: Numeric; Certainty value. See certainty
. LRNresults$InfoValue
: Numeric; Value describing partial Shannon information. See infoValue
.LRNresults$Remarkable
: Numeric; Product of Certainty
and InfoValue
divided by 100
.LRNresults$Importance
: Numeric; Minimum of Certainty
and InfoValue
.LRNresults$InfoContent
: Numeric; InformationContent
from GOTermInfosBP/MF/CC.lrn
depending on OnlyManuCur
.LRNresults$InfoContentORA
: Numeric; -log2(ObservedNrOfAnnsInTerm/NrOfGenesInSample)
.LRNresults$IsHeadline
: Boolean; 1
if GO term is headline, 0
if not.LRNresults$IsDetail
: Boolean; 1
if GO term is detail, 0
if not.NAMESresults
: List of 3:
Information needed to generate the names file containing information about GO terms.NAMESresults$GOtermNr
: Numeric; GO term numbers found to be significant for input genes.NAMESresults$GOtermDescription
: String; Description of GO terms = termDescription(GOtermId)
. NAMESresults$GOtermId
: String; GO term Id = termId(GOtermNr)
.Genes2GOtermsMatrix
: Numeric; matrix explaining the connection of genes and GO terms.
Genes2GOtermsMatrix[i,j]==1
iff gene in i
-th row is annotated to GO term in j
-th row.GO2GOAdjMatrices
: List of 4:
Adjacency matrices for each ontology and combined sparse matrix.GO2GOAdjMatrices$GO2GOAdjMatrix
: Numeric; Block diagonal adjacency matrix (formal class dgCMatrix
from package Matrix
) describing the complete directed
acyclic graph (DAG) of the significant GOterms up to the root, i.e. the edges between GOterms and their parents. GO2GOAdjMatrix[i,j]== 1
iff i
is parent of j
.
First row contains numbers 1
, 2
, and 4
specifying the ontology BP
, MF
, and CC
.GO2GOAdjMatrices$AdjMatrixGO2GOBP
: Numeric; (non-sparse) Adjacency matrix of BP-DAG.
AdjMatrixGO2GOBP[i,j]==1
iff i
is parent of j
.GO2GOAdjMatrices$AdjMatrixGO2GOMF
: Numeric; (non-sparse) Adjacency matrix of MF-DAG.
AdjMatrixGO2GOBP[i,j]==1
iff i
is parent of j
.GO2GOAdjMatrices$AdjMatrixGO2GOCC
: Numeric; (non-sparse) Adjacency matrix of CC-DAG.
AdjMatrixGO2GOBP[i,j]==1
iff i
is parent of j
.Author:
CL
See also:
dbtORA
.
Example:
For this example, again the random set of genes from ExampleNCBInamesFile.names
in system.file('extdata', package = 'ORA')
is used. P-value threshold,
again, is set to 1
.
For further documentation for function ReadNAMES
, package DataIO
from GitHub is recommended.
# NCBIs <- ReadNAMES('ExampleNCBInamesFile.names', system.file('extdata', package = 'ORA'))$Key NCBIs <- c(8402, 6199, 72, 387254, 10083, 170370, 25921, 9324, 7305, 6675, 2, 3224, 90342, 20, 121340, 89792, 83998, 140469, 5005, 7398, 26575, 53826, 5024, 50618, 3061, 51176, 7903, 90529, 28316, 6406) ORA(NCBIs, Correction = "BON", PvalueThreshold = 1, MinNrOfGenes = 2, OnlyManuCur = FALSE, RefSet = NULL, GOAall = ReadLRN("GOAall.lrn", system.file("extdata",package = "ORA")))
NOTE: There is no warning this time, as OnlyManuCur = FALSE
.
Description:
Function to complete the OutFile
name passed to dbtORA
with ORA parameters.
Usage:
ORAfilename(OutFile, NrOfValidInputGenes, Correction, PvalueThreshold, MinNrOfGenes, OnlyManuCur, WithRefSet = FALSE)
Input parameters:
OutFile
:NrOfValidInputGenes
:#(input genes) - #(duplicated and non-annotated genes)
.Correction
:'BON'
for Bonferroni,
'FDR'
for False Discovery Rate,
'RAW'
if no correction should be done.PvalueThreshold
:PvalueThreshold
will be ignored.MinNrOfGenes
:MinNrOfGenes
genes will be considered in calculation.OnlyManuCur
:TRUE
if only manually curated gene annotations should be considered.WithRefSet
:FALSE
.
Set TRUE
if a reference set of genes is used.Output parameters:
OutFilePlusParams
: String; The complemented OutFile
.Author:
CL
See also:
dbtORA
.
Description:
Function draws and colors the gene ontology DAG of input GO terms depending on input parameters
and saves it as PlotFile
.
Usage:
plotGOgraph(Adj,GOtermIDs,PlotFile,PlotDirectory=getwd(), Significant=rep(1,length(GOtermIDs)),IsHeadline=rep(0,length(GOtermIDs)), MarkDetails=TRUE, Overwrite=TRUE, GOtermString=NULL,Remarkable=NULL,Pvalues=NULL, NrGenesInTerm=NULL,Expected = NULL, Observed = NULL, Importance = NULL, Up=NULL)
Input parameters:
Adj
:Adj[i,j]==1
iff i
is parent of j
.GOtermIDs
:"GO:0008150"
.PlotFile
:'png'
, 'pdf'
, 'eps'
. (If no extension specified 'png'
is used).PlotDirectory
:getwd()
.Significant
:Significant==1
are drawn in red, others white.IsHeadline
:IsHeadline==1
are marked in yellow.MarkDetails
:TRUE
.TRUE
if details of the dag should be coloured blue. Overwrite
:TRUE
.TRUE
if files in PlotDirectory
with same file name as PlotFile
should be replaced by new file. GOtermString
:"biological_process"
. If not given (NULL
), GO term strings will be generated automatically.Remarkable
:remarkableness
.Pvalues
:NrGenesInTerm
:Observed
:Expected
:Importance
:Up
:Up
is set to 1
else 0
. Significant GO terms where Up==1
are marked in red, other significant terms in green.Output parameters:
PlotFile
in PlotDirectory
with DAG of input GO terms.
Author:
CL
See also:
drawDAG
.
Example:
Artificial example just to show the functionality of plotGOgraph
.
OntologyNr <- 1 # for biological process Ancestors <- termsAncestors(GOtermNr = 6796, OntologyNr)$Ancestors AdjMatrixAndTermNrs <- adjMatrixTermsAncestors(GOtermNrInclAncestors = c(6796, Ancestors), OntologyNr) GOtermID <- termId(c(6796, Ancestors)) PlotFile <- 'Example4Vignette.png' Significant <- c(1, 1, 0, 1, 1, 1) IsHeadline <- c(0, 0, 0, 1, 1, 0) Up <- c(1, 1, 1, 1, 0, 1) plotGOgraph(Adj = AdjMatrixAndTermNrs$AdjMatrix, GOtermID = GOtermID, PlotFile = PlotFile, PlotDirectory=getwd(), Significant=Significant, IsHeadline=IsHeadline, MarkDetails=TRUE, Overwrite=TRUE, GOtermString=NULL, Remarkable=NULL, Pvalues=NULL, NrGenesInTerm=NULL, Expected = NULL, Observed = NULL, Importance = NULL, Up=Up)
Description:
Function to calculate the remarkableness of a GO term.
Usage:
remarkableness(Certainty, InfoValue)
Input parameters:
Certainty
:InfoValue
:infoValue(NrOfAnnotationsInTerm, NrOfGenesInUniverse)
.Output parameters:
Remakable
:Author:
CL
See also:
certainty
,
infoValue
,
importance
.
Example:
Certainty <- c(30, 60, 70, 90) InfoValue <- c(80, 70, 20, 70) remarkableness(Certainty, InfoValue)
Description:
Yields description of inputed GO-term ID.
Usage:
termDescription(GOtermId)
Input parameters:
GOtermId
:"GO:0008150"
.Details:
Requires package GO.db
.
Output parameters:
GOTermDescription
:"biological process"
.Authors:
CL, MT, AU
See also:
termId'](# termId), [
termNr`.
Example:
termDescription('GO:0008150')
Description:
Casts GO term numbers to GO term IDs.
Usage:
termId(GOtermNr)
Input parameters:
GOtermNr
:
Numeric; Vector of GO term numbers, e.g. 8150
.Output parameters:
GOtermId
:"GO:0008150"
.Authors:
CL, MT, AU
See also:
termDescription'](# termDescription), [
termNr`.
Example:
termId(8150)
Description:
Casts GO term IDs to GO term numbers.
Usage:
termNr(GOtermId)
Input parameters:
GOtermId
:"GO:0008150"
.Output parameters:
GOtermNr
:8150
.Authors:
CL, MT, AU
See also:
termDescription'](# termDescription), [
termId`.
Example:
termNr('GO:0008150')
Description:
Calculates headlines for each path from root to GO detail terms. Headlines
represent most important nodes in these paths, i.e. the GO terms with the highest Importance
on the path.
Usage:
termpathsHeadlines(AdjMatrix, GOtermNr, Importance, OntologyNr = 1)
Input parameters:
AdjMatrix
:AdjMatrix[i,j] == 1
iff there exists an edge in
GO DAG from node i
to node j
, i.e. i
is parent of j
.GOtermNr
:AdjMatrix
.Importance
:0
and 100
specifying the importance of the corresponding GO term. For example, can be
importance or remarkableness of GO terms.OntologyNr
:1
.GOtermNr
s are. One of:
1
for biological process,
2
for molecular function or
4
for cellular component.Details:
PLEASE NOTE:
All given GO terms have to be in the specified ontology and all
ancestors from GO terms to ontology root (including the root term itself) must
be included in the given vector of GO terms.
Output parameters:
Headlines
:Importance
for each path from detail to GO root.AllTaxonomies
:AdjMatrix
in form of GO term numbers vectors.
(E.g. AllTaxonomies[[1]]== c(8150, 44699, 44763, 22402, 51231, 22)
is a path from
BP root "GO:0008150"
to detail "GO:0000022"
via nodes 44699
, 44763
, 22402
and 51231
in BP gene ontology.)MaxImportanceInd
:AllTaxonomies
.
(E.g. MaxImportanceInd==c(2,4,5)
then AllTaxonomies[[1]][2]
would be the GO term
with the highest importance value (i.e. headline) in the first path,
AllTaxonomies[[2]][4]
the one in the second path and AllTaxonomies[[3]][5]
the one
in the third path.)Author:
CL
See also:
GOroot2TermPaths
, remarkableness
, importance
.
Example:
OntologyNr <- 1 # for biological process Ancestors <- termsAncestors(GOtermNr = 6796, OntologyNr)$Ancestors AdjMatrixAndTermNrs <- adjMatrixTermsAncestors(GOtermNrInclAncestors = c(6796, Ancestors), OntologyNr) Importance <- c(29, 49, 75, 98, 37, 92) termpathsHeadlines(AdjMatrix = AdjMatrixAndTermNrs$AdjMatrix, GOtermNr = AdjMatrixAndTermNrs$GOtermNrs, Importance = Importance, OntologyNr = OntologyNr)
termsAncestors
Description:
Function returns vector of all ancestors in ontology for given GO term numbers.
Usage:
termsAncestors(GOtermNr, OntologyNr)
Input parameters:
GOtermNr
:
Numeric; Vector of GO term numbers.
OntologyNr
:
Numeric; To select the ontology. One of:
1
for biological process,
2
for molecular function or
4
for cellular component.
Output parameters:
Ancestors
:TermsWithoutAncestors
:Author:
CL
See also:
adjMatrixTermsAncestors
.
Example:
OntologyNr <- 1 # for biological process termsAncestors(GOtermNr = 6796, OntologyNr)
Description:
With this function user can update database used for ORA. Determining the point in
time this should be done himself. (E.g., to keep the database untill project is done.)
The new files will be temporary saved in system.file('extdata', package = 'ORA')
. Updating the
data is only necessary if user updates R to newer version and packages GO.db
and
org.Hs.eg.db
are updated. Timestamp of data provided in the package ORA: 31 January 2018.
Function has then to be called every time after restart of R session.
Usage:
updateORAdatabase()
Author:
CL
See also:
dbtORA
.
Description:
Function to write *.lrn
file of GO terms and the computed values, *.names
file of GO terms and their
descriptions, matrix of annotations of genes and GO terms, matrix of structure of DAG of GO terms and a
corresponding explanatory *.names
file.
Usage:
WriteORAresults(FileNameWithoutExt, ORAresults, OutDirectory = getwd(), InFileWithExt = "")
Input parameters:
ORAresults
:LRNresults
: List of 19 containing results relevant for lrn:LRNresults = list(GOtermNr, OntologyNr, NrOfGenesInUniverse,
NrOfGenesInSample, NrOfAnnotationsInTerm, Up,
ExpNrOfAnnsInTerm, ObservedNrOfAnnsInTerm,
RelDiff, Pvalue, LogPvalue, Certainty, InfoValue,
Remarkable, Importance, InfoContent, InfoContentORA, IsHeadline, IsDetail)
NAMESresults
: List of 3 containing results relevant for names:NAMESresults = list(GOtermNr, GOtermDescription, GOtermId)
Genes2GOtermsMatrix
: Matrix describing the annotations of genes to GO terms.GO2GOAdjMatrices
: List of 4 containing the adjacency matrices for each ontology and the
combined matrix representing the DAGs of significant GOterms:GO2GOAdjMatrices = list(GO2GOSparseAdjMatrix, AdjMatrixGO2GOBP,
AdjMatrixGO2GOMF, AdjMatrixGO2GOCC)
OutDirectory
:InFileWithExt
:''
.For further detail on the input see ORA
Output parameters:
Files saved in OutDirectory
containing all the information received by ORA, Genes to GO terms matrix
and adjacency matrix of the GO terms. For further detail on the output see dbtORA
.
Author:
CL
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.