Description Usage Arguments Details Value References Examples
View source: R/topoicsim_algorithms.R
Algorithm to calculate similarity between GO terms of two genes/genelists.
1 2 3 4 5 |
organism |
organism where to be scanned genes reside in, this option is neccesary to select the correct GO DAG. Options are based on the org.db bioconductor package; http://www.bioconductor.org/packages/release/BiocViews.html#___OrgDb Following options are available: "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine", "canine", "anopheles", "ecsakai", "chicken", "chimp", "malaria", "rhesus", "pig", "xenopus". |
ontology |
desired ontology to use for similarity calculations. One of three; "BP" (Biological process), "MF" (Molecular function) or "CC" (Cellular Component). |
genes1 |
Gene ID(s) of the first Gene (vector). |
genes2 |
Gene ID(s) of the second Gene (vector). |
custom_genes1 |
Custom genes added to the first list, needs to be a named list with the name being the arbitrary ID and the value being a vector of GO terms. |
custom_genes2 |
same as custom_genes1 but added to second gene list. |
verbose |
set to true for more informative/elaborate output. |
debug |
verbosity for debugging. |
progress_bar |
Whether to show the progress of the calculation (default = FALSE) |
garbage_collection |
whether to do R garbage collection. This is useful for very large calculations/datasets, as it might decrease ram usage. This option might however increase calculation time slightly. |
use_precalculation |
wheter to use precalculated score matrix or not. This speeds up calculation for the most frequent GO terms. Only available for human, mouse with ids entrez/ensembl. Default is False because this is the safest and most accurate option. Every update of org.Db libraries makes this matrix outdated, so use at your own risk. |
drop |
vector of evidences in go data structure you want to skip (see set_go_data). |
all_go_pairs |
dataframe of GO Term pairs with a column representing similarity between the two. You can add the dataframe from previous runs to improve performance (only works if the last result has at least part of the genes of the current run). You can also use it for pre-calculation and getting the results back in a fast manner. |
idtype |
id type of the genes you specified. default="ENTREZID". To see other options, enter empty string. |
go_data |
prepared go_data, from the set_go_data function. It is practically the same as in GOSemSim, but with a slightly nicer interface. |
This function is made for calculating topological similarity between two gene vectors of which each gene has its GO terms in the GO DAG structure. The topological similarity is based on edge weights and information content (IC). The output it a nxn matrix depending on the vector lengths. Intraset similarity can be calculated by comparing the same gene vector to itself and using mean() on the output. The same can be done for Interset similarity, but between two different gene lists (IntraSet and InterSet similarities are only applicable to gene sets). [1]
List containing the following; $GeneSim; similarity between genes taken from the mean of all term similarities (single gene). Or a nxn matrix of gene similarities. Intraset similarity can be calculated by comparing the same gene vector to itself and using mean() on the output. The same can be done for Interset similarity, but between two different gene vectors (gene vector). ; $AllGoPairs; All possible GO combinations with their semantic distances (matrix). NAs might be present in the matrix, these are GO pairs that didn't occur.
[1] Ehsani R, Drablos F: TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinformatics 2016, 17(1):296)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # single gene mode
result <- GAPGOM::topo_ic_sim_genes("human", "MF", "218", "501")
# genelist mode
list1 <- c("126133","221","218","216","8854","220","219","160428","224",
"222","8659","501","64577","223","217","4329","10840","7915","5832")
# ONLY A PART OF THE GENELIST IS USED BECAUSE OF R CHECK TIME CONTRAINTS
result <- GAPGOM::topo_ic_sim_genes("human", "MF", list1[1:2],
list1[1:2])
# with custom gene
custom <- list(cus1=c("GO:0016787", "GO:0042802", "GO:0005524"))
result <- GAPGOM::topo_ic_sim_genes("human", "MF", "218", "501",
custom_genes1 = custom)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.