Working with StarBioTrek package

knitr::opts_chunk$set(dpi = 300)
knitr::opts_chunk$set(cache=FALSE)
devtools::load_all(".")

Introduction

Motivation: New technologies have made possible to identify marker gene signatures. However, gene expression-based signatures present some limitations because they do not consider metabolic role of the genes and are affected by genetic heterogeneity across patient cohorts. Considering the activity of entire pathways rather than the expression levels of individual genes can be a way to exceed these limits [@ref12]. This tool StarBioTrek presents some methodologies to measure pathway activity and cross-talk among pathways integrating also the information of network and TCGA data. New measures are under development.

Installation

To install use the code below.

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("StarBioTrek")

Get data: Get pathway and network data

SELECT_path_species: Select the pathway database and species of interest

The user can select the pathway database and species of interest using some functions implemented in graphite [@ref1]

library(graphite)
sel<-pathwayDatabases()
knitr::kable(sel, digits = 2,
             caption = "List of patwhay databases and species",row.names = FALSE)

GetData: Searching pathway data for download

The user can easily search pathways data and their genes using the GetData function. It can download pathways from several databases and species using the following parameters:

species="hsapiens"
pathwaydb="kegg"
path<-GetData(species,pathwaydb)

GetPathData: Get genes inside pathways

The user can identify the genes inside the pathways of interest

pathway_ALLGENE<-GetPathData(path_ALL=path[1:3])

GetPathNet: Get interacting genes inside pathways

GetPathNet generates a list of interacting genes for each pathway

pathway_net<-GetPathNet(path_ALL=path[1:3])

ConvertedIDgenes: Get genes inside pathways

The user can convert the gene ID into GeneSymbol

pathway<-ConvertedIDgenes(path_ALL=path[1:10])

getNETdata: Searching network data for download

You can easily search human network data from GeneMania using the getNETdata function [@ref2]. The network category can be filtered using the following parameters:

The species can be filtered using the following parameters: Arabidopsis_thaliana Caenorhabditis_elegans Danio_rerio Drosophila_melanogaster Escherichia_coli Homo_sapiens Mus_musculus Rattus_norvegicus * Saccharomyces_cerevisiae

For default the organism is homo sapiens. The example show the shared protein domain network for Saccharomyces_cerevisiae. For more information see SpidermiR package.

organismID="Saccharomyces_cerevisiae"
netw<-getNETdata(network="SHpd",organismID)

Integration data: Integration between pathway and network data

path_net: Network of interacting genes for each pathway according a network type (PHint,COloc,GENint,PATH,SHpd)

The function path_net creates a network of interacting genes (downloaded from GeneMania) for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata. The output will be a network of genes belonging to the same pathway.

lista_net<-pathnet(genes.by.pathway=pathway[1:5],data=netw)

list_path_net: List of interacting genes for each pathway (list of genes) according a network type (PHint,COloc,GENint,PATH,SHpd)

The function list_path_net creates a list of interacting genes for each pathway. Interacting genes are genes belonging to the same pathway and the interaction is given from network chosen by the user, according the paramenters of the function getNETdata. The output will be a list of genes belonging to the same pathway and those having an interaction in the network.

list_path<-listpathnet(lista_net=lista_net,pathway=pathway[1:5])

Pathway summary indexes: Score for each pathway

GE_matrix: grouping gene expression profiles in pathways

Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix with the gene expression levels grouped by pathways.

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function GE_matrix creates a profile of gene expression levels for each pathway given by the user:

list_path_gene<-GE_matrix(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10])

GE_matrix_mean:

Get human KEGG pathway data and a gene expression matrix in order to obtain a matrix PXG (in the columns there are the pathways and in the rows there are genes) with the mean gene expression for only genes given containing in the pathways given in input by the user.

list_path_plot<-GE_matrix_mean(DataMatrix=tumo[,1:2],genes.by.pathway=pathway[1:10])

average: Average of genes for each pathway starting from a matrix of gene expression

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function average creates an average matrix (SXG: S are the samples and P the pathways) of gene expression for each pathway:

score_mean<-average(pathwayexpsubset=list_path_gene)

stdv: Standard deviations of genes for each pathway starting from a matrix of gene expression

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function stdv creates a standard deviation matrix of gene expression for each pathway:

score_st_dev<-stdv(gslist=list_path_gene)

Pathway cross-talk indexes: Score for pairwise pathways

eucdistcrtlk: Euclidean distance for cross-talk measure

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function eucdistcrtlk creates an euclidean distance matrix of gene expression for pairwise pathway.

score_euc_distance<-eucdistcrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10])

dsscorecrtlk: Discriminating score for cross-talk measure

Starting from a matrix of gene expression (rows are genes and columns are samples, TCGA data) the function dsscorecrtlk creates an discriminating score matrix for pairwise pathway as measure of cross-talk. Discriminating score is given by |M1-M2|/S1+S2 where M1 and M2 are mean and S1 and S2 standard deviation of expression levels of genes in a pathway 1 and and in a pathway 2 .

cross_talk_st_dv<-dsscorecrtlk(dataFilt=tumo[,1:2],pathway_exp=pathway[1:10])

Selection of pathway cross-talk: Selection of pathway cross-talk

svm_classification: SVM classification

Given the substantial difference in the activities of many pathways between two classes (e.g. normal and cancer), we examined the effectiveness to classify the classes based on their pairwise pathway profiles. This function is used to find the interacting pathways that are altered in a particular pathology in terms of Area Under Curve (AUC).AUC was estimated by cross-validation method (k-fold cross-validation, k=10).It randomly selected some fraction of TCGA data (e.g. nf= 60; 60% of original dataset) to form the training set and then assigned the rest of the points to the testing set (40% of original dataset). For each pairwise pathway the user can obtain using the methods mentioned above a score matrix ( e.g.dev_std_crtlk ) and can focus on the pairs of pathways able to differentiate a particular subtype with respect to the normal type.

nf <- 60
res_class<-svm_classification(TCGA_matrix=score_euc_dista[1:30,],nfs=nf,
normal=colnames(norm[,1:10]),tumour=colnames(tumo[,1:10]))

IPPI: Driver genes for each pathway

The function IPPI, using pathways and networks data, calculates the driver genes for each pathway. Please see Cava et al. BMC Genomics 2017.

 DRIVER_SP<-IPPI(pathax=pathway_matrix[,1:3],netwa=netw_IPPI[1:50000,])

Visualization: Gene interactions and pathways

StarBioTrek presents several functions for the preparation to the visualization of gene-gene interactions and pathway cross-talk using the qgraph package [@ref3]. The function plotcrosstalk prepares the data:

formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo)
library(qgraph)
qgraph(formatplot[[1]], minimum = 0.25, cut = 0.6, vsize = 5, groups = formatplot[[2]], legend = TRUE, borders = FALSE,layoutScale=c(0.8,0.8))
qgraph(formatplot[[1]],groups=formatplot[[2]], layout="spring", diag = FALSE,
cut = 0.6,legend.cex = 0.5,vsize = 6,layoutScale=c(0.8,0.8))

A circle can be generated using the function circleplot [@ref4]. A score for each gene can be assigned.

formatplot<-plotcrosstalk(pathway_plot=pathway[1:6],gs_expre=tumo)
score<-runif(length(formatplot[[2]]), min=-10, max=+10)
circleplot(preplot=formatplot,scoregene=score)
library(png)
library(grid)
img <- readPNG("circleplot.png")
grid.raster(img)

Session Information


sessionInfo()

References



Try the StarBioTrek package in your browser

Any scripts or data that you put into this service are public.

StarBioTrek documentation built on Nov. 8, 2020, 8:02 p.m.