README.md
In joseflaviojr/transmod: Transcriptogram and Modularity

transmod - Transcriptogram and Modularity

R package that implements a method to specify list of most relevant genes among differential expression profiles based on gene network knowledge. This is done through transcriptogram of gene expression profiles and analysis of differentially expressed modules.

Supporting Institutions:

Main reference: DIAS JÚNIOR, J.F.S.; ALVES, R.; COMMES, T. A module-based approach for evaluating differential genome-wide expression profiles. In the 5th Brazilian Conference on Intelligent System (BRACIS). Recife, PE, Brazil, October 9-12, 2016.

This package was developed and tested only on version 3.3.1 of R platform.

Since transmod is still under development, it is not yet available on CRAN repository.

It is necessary to install other packages for the full operation of the transmod:

# devtools
install.packages("devtools")

# GeneSelector
source("https://bioconductor.org/biocLite.R")
biocLite("GeneSelector")

Installing the transmod package:

library(devtools)
install_github("joseflaviojr/transmod")

Example of selection and analysis of differentially expressed genes.

The transmod package contains sample data, which will be used here:

library(transmod)

seriation <- seriation_dias2016
expression <- expression_gse48173

The seriation_dias2016 is a list of human genes ordered according to a network. Reference: https://doi.org/10.1109/BRACIS.2016.069. For more details, run ?seriation_dias2016.

The expression_gse48173 is a RNA-Seq data corresponding to experimental study about leukemia. The table contains 72 human samples: 43 Acute Myeloid Leukemia (AML), 12 Acute Lymphoblastic Leukemia (ALL) and 17 Healthy (HEA). The samples cover 21865 genes:

             GSM1185603_HEA GSM1185604_HEA ...
WASH7P              1.28108        0.76803
OR4F5               0.00000        0.00000
LOC100133331        0.61782        0.58245
LOC100288069        3.95892        3.08883
NCRNA00115          1.50934        1.08628
LOC643837           2.03448        1.45142
...

Arranging the expression table according to the seriation:

seriation <- intersect(seriation,rownames(expression))
expression <- expression[seriation,]

Calculating the transcriptogram of each sample (expression profile); because this method is pure R, its execution can be time consuming:

tgram <- transcriptogram(expression)

Calculating the differentiation level of each gene between AML and healthy samples:

diffg <- differentiate(tgram, 1:17, 30:72)

The head(diffg) returns 1.928160 2.060090 2.038235 2.027537 2.030800 2.028469.

The differentiation series is inspected in order to detect modules:

modules <- modularize(diffg)
modules_summary <- summarize_modules(diffg, modules)

The variable modules_summary contains the summary of each module detected:

  module begin end size     mean      max      min
1      1     1  91   91 2.163497 3.028608 1.089711
2      2    92  92    1 1.090482 1.090482 1.090482
3      3    93 270  178 3.339707 5.585274 1.088448
4      4   271 276    6 2.377144 2.428238 2.337269
5      5   277 277    1 2.343614 2.343614 2.343614
6      6   278 392  115 5.962280 7.465978 2.181652
...

To view the levels of differentiation for each gene and the detected modules, run:

palette(c("black","gray"))
plot(diffg, col=modules, type="h", main="Differentially Expressed Modules", xlab="Genes", ylab="Differentiation Level")

Result:

Image - Differentially Expressed Modules

Selecting the 100 most relevant genes among the modules:

selection_index <- select_from_modules(diffg, modules, select=100)
selection <- seriation[selection_index]
cat(selection, sep="\n")

Result:

COPE
LYRM2
HDHD2
BCL7B
CSNK1E
REL
POP5
EFHC1
KIAA0408
AURKAIP1
...

The selected genes can be submitted to enrichment tools to find scientific data related. Using the list above, for example, in the online tool Enrichr, it is obtained more significantly from databases/ontologies:

OMIM Disease = Leukemia, Cataract
Jensen DISEASES = Acute Promyelocytic Leukemia (AML subtype), Corneal disease
MSigDB Computational = MODULE_13 (related to leukemia and B lymphoma)
LINCS L1000 Chem Pert up = CPC011 HT29 6H-idarubicin hcl-10.0 (the idarubicin is related to the leukemia treatment)
dbGaP = Keratoconus (There are studies that relate leukemia and cataract: 1, 2, 3, 4)
GO Biological Process = positive regulation of transcription from RNA polymerase II promoter (this class contains 10 term members directly related to leukemia: B-cell lymphoma/leukemia 11A (BCL11A), B-cell lymphoma/leukemia 11B (BCL11B), Hepatic leukemia factor (HLF), Friend leukemia integration 1 transcription factor (FLI1), Pre-B-cell leukemia transcription factor 2 (PBX2), Pre-B-cell leukemia transcription factor 3 (PBX3), T-cell leukemia homeobox protein 1 (TLX1), T-cell acute lymphocytic leukemia protein 1 (TAL1), Leukemia inhibitory factor (LIF) and T-cell leukemia homeobox protein 2 (TLX2))
GO Cellular Component = spindle pole centrosome (There are studies that relate leukemia and centrosome: 1, 2, 3, 4)
GO Molecular Function = RNA binding (1, 2, 3, 4, 5)
Human Phenotype Ontology = Pectus excavatum (1, 2), Zonular cataract, Congenital cataract
Jensen TISSUES = Corpus callosum (hemorrhagic complications are common in patients with leukemia: 1, 2, 3), Bone marrow

José Flávio de Souza Dias Júnior (Researcher/Coordinator) - joseflaviojr@gmail.com
Ronnie Alves (Researcher) - alvesrco@gmail.com
Thérèse Commes (Researcher) - therese.commes@gmail.com
Andréa do Socorro Bolhosa Sarmento (Scientific Initiation/Scholarship Student) - andreassarmento@yahoo.com.br

joseflaviojr/transmod documentation built on May 9, 2019, 8:34 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com