topo_ic_sim_genes: GAPGOM - topo_ic_sim_genes()

Description Usage Arguments Details Value References Examples

View source: R/topoicsim_algorithms.R

Description

Algorithm to calculate similarity between GO terms of two genes/genelists.

Usage

1
2
3
4
5
topo_ic_sim_genes(organism, ontology, genes1, genes2,
  custom_genes1 = NULL, custom_genes2 = NULL, verbose = FALSE,
  debug = FALSE, progress_bar = TRUE, garbage_collection = FALSE,
  use_precalculation = FALSE, drop = NULL, all_go_pairs = NULL,
  idtype = "ENTREZID", go_data = NULL)

Arguments

organism

organism where to be scanned genes reside in, this option is neccesary to select the correct GO DAG. Options are based on the org.db bioconductor package; http://www.bioconductor.org/packages/release/BiocViews.html#___OrgDb Following options are available: "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine", "canine", "anopheles", "ecsakai", "chicken", "chimp", "malaria", "rhesus", "pig", "xenopus".

ontology

desired ontology to use for similarity calculations. One of three; "BP" (Biological process), "MF" (Molecular function) or "CC" (Cellular Component).

genes1

Gene ID(s) of the first Gene (vector).

genes2

Gene ID(s) of the second Gene (vector).

custom_genes1

Custom genes added to the first list, needs to be a named list with the name being the arbitrary ID and the value being a vector of GO terms.

custom_genes2

same as custom_genes1 but added to second gene list.

verbose

set to true for more informative/elaborate output.

debug

verbosity for debugging.

progress_bar

Whether to show the progress of the calculation (default = FALSE)

garbage_collection

whether to do R garbage collection. This is useful for very large calculations/datasets, as it might decrease ram usage. This option might however increase calculation time slightly.

use_precalculation

wheter to use precalculated score matrix or not. This speeds up calculation for the most frequent GO terms. Only available for human, mouse with ids entrez/ensembl. Default is False because this is the safest and most accurate option. Every update of org.Db libraries makes this matrix outdated, so use at your own risk.

drop

vector of evidences in go data structure you want to skip (see set_go_data).

all_go_pairs

dataframe of GO Term pairs with a column representing similarity between the two. You can add the dataframe from previous runs to improve performance (only works if the last result has at least part of the genes of the current run). You can also use it for pre-calculation and getting the results back in a fast manner.

idtype

id type of the genes you specified. default="ENTREZID". To see other options, enter empty string.

go_data

prepared go_data, from the set_go_data function. It is practically the same as in GOSemSim, but with a slightly nicer interface.

Details

This function is made for calculating topological similarity between two gene vectors of which each gene has its GO terms in the GO DAG structure. The topological similarity is based on edge weights and information content (IC). The output it a nxn matrix depending on the vector lengths. Intraset similarity can be calculated by comparing the same gene vector to itself and using mean() on the output. The same can be done for Interset similarity, but between two different gene lists (IntraSet and InterSet similarities are only applicable to gene sets). [1]

Value

List containing the following; $GeneSim; similarity between genes taken from the mean of all term similarities (single gene). Or a nxn matrix of gene similarities. Intraset similarity can be calculated by comparing the same gene vector to itself and using mean() on the output. The same can be done for Interset similarity, but between two different gene vectors (gene vector). ; $AllGoPairs; All possible GO combinations with their semantic distances (matrix). NAs might be present in the matrix, these are GO pairs that didn't occur.

References

[1] Ehsani R, Drablos F: TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinformatics 2016, 17(1):296)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# single gene mode
result <- GAPGOM::topo_ic_sim_genes("human", "MF", "218", "501")

# genelist mode
list1 <- c("126133","221","218","216","8854","220","219","160428","224",
"222","8659","501","64577","223","217","4329","10840","7915","5832")
# ONLY A PART OF THE GENELIST IS USED BECAUSE OF R CHECK TIME CONTRAINTS
result <- GAPGOM::topo_ic_sim_genes("human", "MF", list1[1:2], 
                                                   list1[1:2])

# with custom gene
custom <- list(cus1=c("GO:0016787", "GO:0042802", "GO:0005524"))
result <- GAPGOM::topo_ic_sim_genes("human", "MF", "218", "501", 
                                    custom_genes1 = custom)

GAPGOM documentation built on Nov. 8, 2020, 8:08 p.m.