knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
This is an example for analyzing an RNA-seq dataset of Pseudomonas aeruginosa PA14 strain. The package currently only supports processed RNA-seq expression data: expression values of each sample at the gene level.
Load in required libraries.
library("ADAGEpath") library("DT") library("readr") library("plyr")
Before any analysis, we need to specify the ADAGE model and the data compendium we want to use.
model <- eADAGEmodel compendium <- PAcompendium probe_dist <- probedistribution
We use dataset E-GEOD-64056 as an example here. Let's first download the zip file that stores processed expression values for each sample and unzip it.
download.file("https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-64056/E-GEOD-64056.processed.1.zip", destfile = "download/E-GEOD-64056.processed.1.zip") unzip(zipfile = "download/E-GEOD-64056.processed.1.zip", exdir = "download/E-GEOD-64056.processed.1")
This dataset contains 6 RNA-seq samples and 2 CHIP-seq samples. We next read in all the RNA-seq samples and combine them into one data.frame.
# RNAseq files end with "rpg.txt" RNAseq_files <- list.files("download/E-GEOD-64056.processed.1/", pattern = "*.rpg.txt") RNAseq_file_paths <- file.path("download/E-GEOD-64056.processed.1/", RNAseq_files) RNAseq_samples <- lapply(RNAseq_file_paths, function(x) readr::read_tsv(x, col_names = FALSE)) RNAseq_data <- plyr::join_all(RNAseq_samples, by = "X1") colnames(RNAseq_data) <- c("geneID", RNAseq_files) DT::datatable(RNAseq_data)
The gene IDs in this dataset are in the format of "PA14_XXXXX,symbol". We need to clean them to only contain "PA14_XXXXX".
RNAseq_data$geneID <- sapply(RNAseq_data$geneID, function(x) unlist(strsplit(x, ","))[1])
Now the RNAseq_data
is in the right format that can be processed by the
load_dataset
function. This processing step will take a while, because it
needs to first map PA14 gene IDs to PAO1 gene IDs. Then it needs to impute
the expression of missing genes. Finally it normalizes RNAseq expression
values to comparable ranges with microarray expression values using
TDM.
data_raw <- load_dataset(input = RNAseq_data, isProcessed = TRUE, isRNAseq = TRUE, model = model, compendium = compendium, quantile_ref = probe_dist, norm01 = FALSE)
ADAGE only accepts expression values in the (0,1) range. We linearly transform expression values to be between 0 and 1 using the Pa compendium as the reference.
data_normed <- zeroone_norm(input_data = data_raw, use_ref = TRUE, ref_data = compendium)
Now let's specify the phenotypes for each sample. It needs to be a character vector and has the same sample order as the expression data loaded above.
data_pheno <- c("phoB", "phoB", "tctD", "tctD", "wt", "wt")
Now this RNAseq dataset is ready for ADAGE signature analysis. Please refer to this vignette on how to perform ADAGE signature analysis.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.