RunCRE_HSAStringDB: This function runs a causal relation engine by computing the...

Description Usage Arguments Value Author(s) References Examples

View source: R/ProcessData.R

Description

This function runs a causal relation engine by computing the Quaternary Dot Product Scoring Statistic, Ternary Dot Product Scoring Statistic or the Enrichment test over the Homo Sapien STRINGdb causal network (version 10 provided under the Creative Commons license: https://creativecommons.org/licenses/by/3.0/). Note that the user has the option of specifying other causal networks with this function.

Usage

1
2
3
4
5
6
RunCRE_HSAStringDB(gene_expression_data, method = "Quaternary", 
                    fc.thresh = log2(1.3), pval.thresh = 0.05, 
                    only.significant.pvalues = FALSE, 
                    significance.level = 0.05,
                    epsilon = 1e-16, progressBar = TRUE, 
                    relations = NULL, entities = NULL)

Arguments

gene_expression_data

A data frame for gene expression data. The gene_expression_data data frame must have three columns entrez, fc and pvalue. entrez denotes the entrez id of a given gene, fc denotes the fold change of a gene, and pvalue denotes the p-value. The entrez column must be of type integer or character, and the fc and pvalue columns must be numeric values.

method

Choose one of Quaternary, Ternary or Enrichment. Default is Quaternary.

fc.thresh

Threshold for fold change in gene_expression_data data frame. Any row in gene_expression_data with abosolute value of fc smaller than fc.thresh will be ignored. Default value is fc.thresh = log2(1.3).

pval.thresh

Threshold for p-values in gene_expression_data data frame. All rows in gene_expression_data with p-values greater than pval.thresh will be ingnored. Default value is pval.thresh = 0.05.

only.significant.pvalues

If only.significant.pvalues = TRUE then only p-values for statistically significant regulators are computed otherwise uncomputed p-values are set to -1. The default value is only.significant.pvalues = FALSE.

significance.level

When only.significant.pvalues = TRUE, only p-values which are less than or equal to significance.level are computed. The default value is significance.level = 0.05.

epsilon

Threshold for probabilities of matrices. Default value is threshold = 1e-16.

progressBar

Progress bar for the percentage of computed p-values for the regulators in the network. Default value is progressBar = TRUE.

relations

A data frame containing pairs of connected entities in a causal network, and the type of causal relation between them. The data frame must have three columns with column names: srcuid, trguid and mode respective of order. srcuid stands for source entity, trguid stands for target entity and mode stands for the type of relation between srcuid and trguid. The relation has to be one of +1 for upregulation, -1 for downregulation or 0 for regulation without specified direction of regulation. All three columns must be of type integer. Default value is relations = NULL.

entities

A data frame of mappings for all entities present in data frame relations. entities must contain four columns: uid, id, symbol and type respective of order. uid must be of type integer and id, symbol and type must be of type character. uid includes every source and target node in the network (i.e relations), id is the id of uid (e.g entrez id of an mRNA), symbol is the symbol of id and type is the type of entity of id (e.g mRNA, protein, drug or compound). Default value is entities = NULL.

Value

This function returns a data frame containing parameters concerning the method used. The p-values of each of the regulators is also computed, and the data frame is in increasing order of p-values of the goodness of fit score for the given regulators. The column names of the data frame are:

Author(s)

Carl Tony Fakhry, Ping Chen and Kourosh Zarringhalam

References

Carl Tony Fakhry, Parul Choudhary, Alex Gutteridge, Ben Sidders, Ping Chen, Daniel Ziemek, and Kourosh Zarringhalam. Interpreting transcriptional changes using causal graphs: new methods and their practical utility on public networks. BMC Bioinformatics, 17:318, 2016. ISSN 1471-2105. doi: 10.1186/s12859-016-1181-8.

Franceschini, A (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. In:'Nucleic Acids Res. 2013 Jan;41(Database issue):D808-15. doi: 10.1093/nar/gks1094. Epub 2012 Nov 29'.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Get gene expression data
e2f3 <- system.file("extdata", "e2f3_sig.txt", package = "QuaternaryProd")
e2f3 <- read.table(e2f3, sep = "\t", header = TRUE, stringsAsFactors = FALSE)

# Rename column names appropriately and remove duplicated entrez ids
names(e2f3) <- c("entrez", "pvalue", "fc")
e2f3 <- e2f3[!duplicated(e2f3$entrez),]

# Compute the Quaternary Dot Product Scoring statistic for statistically significant
# regulators in the STRINGdb network
enrichment_results <- RunCRE_HSAStringDB(e2f3, method = "Enrichment",
                             fc.thresh = log2(1.3), pval.thresh = 0.05,
                             only.significant.pvalues = TRUE)
enrichment_results[1:4, c("uid","symbol","regulation","pvalue")]

QuaternaryProd documentation built on Nov. 8, 2020, 8:23 p.m.