Description Usage Arguments Details Value Examples
Annotates hits to genes, performs random walk with restarts on a network of protein complexes, and then scores each gene in the network for its association with the phenotype of interest
1 2 3 4 5 | runComplexID(Hits, phenoSim, promoterRange = 1e+05, eps = 1e-10,
alpha = 0.8, upstream = 0, downstream = 0, geneBody = T,
promoters = T, promoterTissues = "all", utr = T, eqtl = T,
eqtlTissues = "all", enhancers = T, enhancerTissues = "all",
loopDist = 0, non_proteins = F, geneScoring = sum, useAllTSS = T)
|
Hits |
Granges object with two meta data columns, or a matrix or data frame with at least 5 columns. |
phenoSim |
matrix or data frame with two columns. The first column are names of phenotypes that match the same phenotypes found in Hits. The second column are phenotype similarity values between the phenotype in that row and the phenotype of interest (values between 0 and 1), with higher values denoting higher similarity |
promoterRange |
single integer greater than or equal to zero. How many bases to look upstream of a TSS of a gene in order to find a promoter region for a gene. |
eps |
single numeric, must be greater than zero. L1 norm threshold between current and previous interations of random walk at which to terminate the random walk |
alpha |
single numeric in the range of (0,1]. The weight given to the vector of initialized values for the random walk, higher value of alpha means more weight for the initialized values |
upstream |
single integer. By default 0. How far upstream of a transcription start site a hit can be for it to be annotated to that gene. A NULL value is equivalent to a value of zero (no upstream sites will be annotated to a gene unless they lie in a promoter region, see promoterRange parameter). |
downstream |
single integer. By default 0. How far downstream of a transcription start site a hit can be for it to be annotated to that gene. A NULL value is equivalent to a value of zero (no downstream sites will be annotated to a gene). |
geneBody |
TRUE or FALSE, by default TRUE. If TRUE, then hits will be annotated to the bodies (exons and introns) of protein coding genes. If FALSE, hits will not be annotated to those regions. |
promoters |
TRUE or FALSE, by default TRUE. If TRUE, then hits will be annotated to promoter regions. If FALSE, hits will not be annotated to promoter regions. |
promoterTissues |
character vector, by default is "all". If "all", then all promoters from all tissues will be included in the annotation, otherwise, only promoter regions from tissues specified by promoterTissues will be used for annotation. |
utr |
TRUE or FALSE. If TRUE then it will look for hits in the 3' and 5' UTRs of genes, otherwise it will not. |
eqtl |
TRUE or FALSE. By default TRUE. If TRUE, then hits may be mapped to eQTL loci, and therefore genes effected by those eQTLs be designated as associated to those hits. |
eqtlTissues |
character vector, by default is "all". If "all", then all eQTLs from all tissues will be included in the annotation, otherwise, only eQTL sites from tissues specified by promoterTissues will be used for annotation. |
enhancers |
TRUE or FALSE. By default TRUE. If TRUE, then hits may be mapped to enhancer loci and linked to genes via looping structures and promoters |
enhancerTissues |
character vector, by default is "all". If "all", then all enhancers from all tissues will be included in the annotation, otherwise, only enhancers regions from tissues specified by promoterTissues will be used for annotation. |
loopDist |
single integer. By default 0. The maximum allowable distance that an enhancer or promoter can be from a looping region to be annotated to it. |
non_proteins |
TRUE or FALSE. By default FALSE. If TRUE then hits may be mapped to non-protein regions, if FALSE then that annotation will not be used. |
geneScoring |
a function that takes a vector and outputs a single number. By default the "sum" function. This is the function that will determine the score of a gene based on the scores of the complexes that it belongs to. The input of the function is a vector of numerical values that represent the scores of the complexes that a gene belongs to. Scores are determined by the RWPCN algorithm. The output of the function should be a single numerical value. |
useAllTSS |
TRUE or FALSE. By default TRUE. If TRUE, then all unique transcription start sites will be considered when looking at upstream regions of a gene (for promoters and upstream regions). If FALSE, it will a single start site for a gene, namely the start of the gene. |
Annotates Hits to genes using a built-in annotation database. Protein coding genes, non-protein coding genes, and UTR annotations come from the ENSEMBL version 89 annotation of GRCH37. Promoter and Enhancer regions are
from ENCODE annotation version 3, eQTL are from the gtexportal version 6.
After the associated genes for each endophenotype are identified, it performs a Random Walk with Restarts on a pre-constructed protein complex network as in the RWPCN method.
The protein complex network was constructed in a similar way is in the RWPCN method. For a PPI we used STRING with a threshold cutoff of 700. Protein IDs in STRING were mapped to approved HUGO names using ENSEMBL and HGNC.
Protein complexes were retrieved from CORUM. Any complex with no genes in the PPI was removed along with 5 of the largest complexes (more than 70 subunits)
A random walk with restarts is initialized and performed as in RWPCN then all genes in the PPI and complexes are scored according to the weights in the complex network.
A list with two objects: a data frame called "scores" and a GRanges object "missingHits"
The data frame "scores" has seven columns showing the scores of each gene, related to how much that gene is important to the query phenotype, as well as other information about the gene. It is ordered with the highest scoring genes first.
The first columns is the HUGO gene names, the second column are the names of the complexes that gene is part of, the third columns is the score for the gene, the fourth column says whether or not the gene was in the PPI and/or a complex, the fifth column says whether or not the gene is a protein coding gene, the sixth column are the features of that gene that have a hit in them, and the seventh column is the number of hits that were annotated to that gene.
The GRanges object "missingHits" lists all of the input hits that were not mapped to any gene.
1 2 3 | data("hits")
data("hits.pheno")
test <- runComplexID(Hits = hits,phenoSim=hits.pheno,promoterRange = 10000,upstream = 1000,downstream = 1000,utr = T)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.