A statistical tool to identify diffentially expressed genes from two transcriptomes of an individual. The method iDEG takes RNA-Seq data as input and outputs a probability and an effect size of differential expression for each gene.
devtools::install_github("QikeLi/iDEG", build_vignettes = TRUE)
To identify differentially expressed genes (DEG), iDEG requires a RNA-Seq data of two transcriptomes. These two transcriptomes should be derived from the same subject, e.g., a transcriptome of a disease sample and a transcriptome of the sample patient's healthy sample. Further, these two transcriptomes should be represented by two vectors in R
. Each element of these two vector corresponds to a gene expression level, gene names are assigned to the names of the vectors.
Let's display the first few rows of the RNA-Seq data provided by iDEG package.
## load package iDEG
library(iDEG)
## display the first 6 rows of an RNA-Seq dataset
data(exp_tnbc_A2C9)
head(exp_tnbc_A2C9)
Remove the genes that were not detected,
## remove the genes that were not detected
exp_tnbc_A2C9 <- exp_tnbc_A2C9[!apply(exp_tnbc_A2C9, 1,
function(x) x[1]<= 5 & x[2]<= 5),]
Each column of exp_tnbc_A2C9
corresponds to a transcriptome. We save each column as a numerical vector and assign gene names to the names of the vectors.
## extract each transcriptome from the data frame to a numerical vector
baseline_trans <- as.vector(exp_tnbc_A2C9$Healthy_Sample)
names(baseline_trans) <- rownames(exp_tnbc_A2C9)
case_trans <- as.vector(exp_tnbc_A2C9$Tumor_Sample)
names(case_trans) <- rownames(exp_tnbc_A2C9)
Beside providing RNA-Seq data of two transcriptomes to iDEG, one needs to determine the distribution to model the RNA-Seq data under study, the assumption of constant dispersion across genes, if normalization is needed. In addition, one needs to specify the degrees of freedom for fitting the marginal distribution of the summary statistics derived from all genes ^[See the iDEG manuscript].
res <- iDEG(baseline = baseline_trans,
case = case_trans,
dataDistribution = 'NB',
constDisp = F,
normalization = T,
df=8,
pct = .0001)
First, let us display the top 10 most differentially expressed genes.
knitr::kable(head(res$result[order(res$result$local_fdr,decreasing = F),],10),
digits = 40)
Row names of this table are gene names. The first two columns are the input data of the two transcriptomes under comparison. The third column corresponds the values of local false discovery rate, which are probabilities of genes not being differentially expressed. The fourth column is the summary statistics, which are the effect sizes reflecting the magnitude of differential expression.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.