ora | R Documentation |
Over-representation analysis (ORA) examines the genes that meet a selection criterion and determines if there are gene sets which are statistically over-represented in that list. This method differs from other methods provided by ErmineJ in that you must set a gene score threshold for gene selection, or define a “hit list” of genes.
Because ORA requires that you set a distinction between “good” and “bad” genes, ORA is most appropriate when you are very confident about the threshold. This is because changing the threshold can change the results, sometimes dramatically. If you are examining genes which naturally fall into two categories (“on chromosome 2” and “not on chromosome 2”), then ORA is the logical choice. Otherwise, in our opinion the other methods are more appropriate.
Technical comment: The probabilities produced by ErmineJ ORA are computed using the hypergeometric distribution, but falls back to using the binomial approximation as needed.
Method overview taken from: http://erminej.msl.ubc.ca/help/tutorials/running-an-analysis-ora
ora( scores = NULL, hitlist = NULL, scoreColumn = 1, bigIsBetter = FALSE, logTrans = FALSE, annotation = NULL, aspects = c("Molecular Function", "Cellular Component", "Biological Process"), threshold = 0.001, geneReplicates = c("mean", "best"), pAdjust = c("FDR", "Bonferroni"), geneSetDescription = "Latest_GO", customGeneSets = NULL, minClassSize = 20, maxClassSize = 200, output = NULL, return = TRUE )
scores |
A data.frame. Rownames have to be gene identifiers (eg. probes,
must be unique), followed by any number of columns. The column used for
scoring is chosen by |
hitlist |
A vector of gene identifiers. ORA method accepts hitlists instead of scores. If a hitlist is provided, logTrans, thresholds and bigIsBetter options are ignored. |
scoreColumn |
Integer or character. Which column of the |
bigIsBetter |
Logical. If TRUE large scores are considered to be higher.
|
logTrans |
Logical. Should the data be -log10 transformed. Recommended for
p values. |
annotation |
Annotation. A file path, a data.frame or a platform short
name (eg. GPL127). If given a platform short name it will be downloaded
from annotation repository of Pavlidis Lab (https://gemma.msl.ubc.ca/annots/).
To get a list of available annotations, use If you are providing a custom gene set, you can leave annotation as NULL |
aspects |
Character vector. Which Go aspects to include in the analysis.
Can be in long form (eg. 'Molecular Function') or short form (eg. |
threshold |
Double. Score threshold (test = ORA only) |
geneReplicates |
What to do when genes have multiple scores in input file (due to multiple probes per gene) |
pAdjust |
Which multiple test correction method to use. Can be "FDR" or 'Westfall-Young' (slower). |
geneSetDescription |
"Latest_GO", a file path that leads to a GO XML or OBO file or a URL that leads to a go ontology file that ends with rdf-xml.gz. If you left annotation as NULL and provided customGeneSets, this argument is
not required and will default to NULL. Otherwise, by default it'll be set to
"Latest_GO" which downloads the latest available GO XML file. This option won't work
without an internet connection. To get a frozen file
that you can use later, see |
customGeneSets |
Path to a directory that contains custom gene set files, paths to custom gene set files themselves or a named list of character strings. Use this option to create your own gene sets. If you provide directory you can specify probes or gene symbols to include in your gene sets. See http://erminej.msl.ubc.ca/help/input-files/gene-sets/ for information about format for this file. If you are providing a list, only gene symbols are accepted. |
minClassSize |
minimum class size |
maxClassSize |
maximum class size |
output |
Output file name. |
return |
If results should be returned. Set to FALSE if you only want a file |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.