| SEMgsa | R Documentation |
Gene Set Analysis (GSA) via self-contained test for group
effect on signaling (directed) pathways based on SEM. The core of the
methodology is implemented in the RICF algorithm of SEMrun(),
recovering from RICF output node-specific group effect p-values, and
Brown’s combined permutation p-values of node activation and inhibition.
SEMgsa(g = list(), data, group, method = "BH", alpha = 0.05, n_rep = 1000, ...)
g |
A list of pathways to be tested. |
data |
A matrix or data.frame. Rows correspond to subjects, and columns to graph nodes (variables). |
group |
A binary vector. This vector must be as long as the number of subjects. Each vector element must be 1 for cases and 0 for control subjects. |
method |
Multiple testing correction method. One of the values
available in |
alpha |
Gene set test significance level (default = 0.05). |
n_rep |
Number of randomization replicates (default = 1000). |
... |
Currently ignored. |
For gaining more biological insights into the functional roles of pre-defined subsets of genes, node perturbation obtained from RICF fitting has been combined with up- or down-regulation of genes from a reference interactome to obtain overall pathway perturbation as follows:
The node perturbation is defined as activated when the minimum among the p-values is positive; if negative, the status is inhibited.
Up- or down- regulation of genes is computed from the weighted adjacency matrix of each pathway as column sum of weights(-1,0,1) over each source node. If the overall sum of node weights is below 1, the pathway is flagged as down-regulated, otherwise as up-regulated.
The combination between these two quantities allows to define the direction (up or down) of gene perturbation. Up- or down regulated gene status, associated with node inhibition, indicates a decrease in activation (or increase in inhibition) in cases with respect to control group. Conversely, up- or down regulated gene status, associated with node activation, indicates an increase in activation (or decrease in inhibition) in cases with respect to control group.
A list of 2 objects:
"gsa", A data.frame reporting the following information for each pathway in the input list:
"No.nodes", pathway size (number of nodes);
"No.DEGs", number of differential espression genes (DEGs) within
the pathway, after multiple test correction with one of the methods
available in p.adjust;
"pert", pathway perturbation status (see details);
"pNA", Brown's combined P-value of pathway node activation;
"pNI", Brown's combined P-value of pathway node inhibition;
"PVAL", Bonferroni combined P-value of pNA, and pNI; i.e., 2* min(pNA, PNI);
"ADJP", Adjusted Bonferroni P-value of pathway perturbation; i.e., min(No.pathways * PVAL; 1).
"DEG", a list with DEGs names per pathways.
Mario Grassi mario.grassi@unipv.it
Grassi, M., Tarantino, B. (2022). SEMgsa: topology-based pathway enrichment analysis with structural equation models. BMC Bioinformatics, 17 Aug, 23, 344. <https://doi.org/10.1186/s12859-022-04884-8>
## Not run:
# Nonparanormal(npn) transformation
als.npn <- transformData(alsData$exprs)$data
# Selection of FTD-ALS pathways from KEGG pathways
paths.name <- c("MAPK signaling pathway",
"Protein processing in endoplasmic reticulum",
"Endocytosis",
"Wnt signaling pathway",
"Neurotrophin signaling pathway",
"Amyotrophic lateral sclerosis")
j <- which(names(kegg.pathways) %in% paths.name)
GSA <- SEMgsa(kegg.pathways[j], als.npn, alsData$group,
method = "bonferroni", alpha = 0.05,
n_rep = 1000)
GSA$gsa
GSA$DEG
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.