README.md

transmod - Transcriptogram and Modularity

R package that implements a method to specify list of most relevant genes among differential expression profiles based on gene network knowledge. This is done through transcriptogram of gene expression profiles and analysis of differentially expressed modules.

Supporting Institutions:

Main reference: DIAS JÚNIOR, J.F.S.; ALVES, R.; COMMES, T. A module-based approach for evaluating differential genome-wide expression profiles. In the 5th Brazilian Conference on Intelligent System (BRACIS). Recife, PE, Brazil, October 9-12, 2016.

Package installation

This package was developed and tested only on version 3.3.1 of R platform.

Since transmod is still under development, it is not yet available on CRAN repository.

It is necessary to install other packages for the full operation of the transmod:

# devtools
install.packages("devtools")

# GeneSelector
source("https://bioconductor.org/biocLite.R")
biocLite("GeneSelector")

Installing the transmod package:

library(devtools)
install_github("joseflaviojr/transmod")

Example usage

Example of selection and analysis of differentially expressed genes.

The transmod package contains sample data, which will be used here:

library(transmod)

seriation <- seriation_dias2016
expression <- expression_gse48173

The seriation_dias2016 is a list of human genes ordered according to a network. Reference: https://doi.org/10.1109/BRACIS.2016.069. For more details, run ?seriation_dias2016.

The expression_gse48173 is a RNA-Seq data corresponding to experimental study about leukemia. The table contains 72 human samples: 43 Acute Myeloid Leukemia (AML), 12 Acute Lymphoblastic Leukemia (ALL) and 17 Healthy (HEA). The samples cover 21865 genes:

             GSM1185603_HEA GSM1185604_HEA ...
WASH7P              1.28108        0.76803
OR4F5               0.00000        0.00000
LOC100133331        0.61782        0.58245
LOC100288069        3.95892        3.08883
NCRNA00115          1.50934        1.08628
LOC643837           2.03448        1.45142
...

Arranging the expression table according to the seriation:

seriation <- intersect(seriation,rownames(expression))
expression <- expression[seriation,]

Calculating the transcriptogram of each sample (expression profile); because this method is pure R, its execution can be time consuming:

tgram <- transcriptogram(expression)

Calculating the differentiation level of each gene between AML and healthy samples:

diffg <- differentiate(tgram, 1:17, 30:72)

The head(diffg) returns 1.928160 2.060090 2.038235 2.027537 2.030800 2.028469.

The differentiation series is inspected in order to detect modules:

modules <- modularize(diffg)
modules_summary <- summarize_modules(diffg, modules)

The variable modules_summary contains the summary of each module detected:

  module begin end size     mean      max      min
1      1     1  91   91 2.163497 3.028608 1.089711
2      2    92  92    1 1.090482 1.090482 1.090482
3      3    93 270  178 3.339707 5.585274 1.088448
4      4   271 276    6 2.377144 2.428238 2.337269
5      5   277 277    1 2.343614 2.343614 2.343614
6      6   278 392  115 5.962280 7.465978 2.181652
...

To view the levels of differentiation for each gene and the detected modules, run:

palette(c("black","gray"))
plot(diffg, col=modules, type="h", main="Differentially Expressed Modules", xlab="Genes", ylab="Differentiation Level")

Result:

Image - Differentially Expressed Modules

Selecting the 100 most relevant genes among the modules:

selection_index <- select_from_modules(diffg, modules, select=100)
selection <- seriation[selection_index]
cat(selection, sep="\n")

Result:

COPE
LYRM2
HDHD2
BCL7B
CSNK1E
REL
POP5
EFHC1
KIAA0408
AURKAIP1
...

The selected genes can be submitted to enrichment tools to find scientific data related. Using the list above, for example, in the online tool Enrichr, it is obtained more significantly from databases/ontologies:

Contributors



joseflaviojr/transmod documentation built on May 9, 2019, 8:34 a.m.