In greenelab/ADAGEpath: ADAGE-based Signature Analysis

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

This is an example for analyzing an RNA-seq dataset of Pseudomonas aeruginosa PA14 strain. The package currently only supports processed RNA-seq expression data: expression values of each sample at the gene level.

Data preparation

Load in required libraries.

library("ADAGEpath")
library("DT")
library("readr")
library("plyr")

Before any analysis, we need to specify the ADAGE model and the data compendium we want to use.

model <- eADAGEmodel
compendium <- PAcompendium
probe_dist <- probedistribution

We use dataset E-GEOD-64056 as an example here. Let's first download the zip file that stores processed expression values for each sample and unzip it.

download.file("https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-64056/E-GEOD-64056.processed.1.zip",
              destfile = "download/E-GEOD-64056.processed.1.zip")
unzip(zipfile = "download/E-GEOD-64056.processed.1.zip",
      exdir = "download/E-GEOD-64056.processed.1")

This dataset contains 6 RNA-seq samples and 2 CHIP-seq samples. We next read in all the RNA-seq samples and combine them into one data.frame.

# RNAseq files end with "rpg.txt"
RNAseq_files <- list.files("download/E-GEOD-64056.processed.1/",
                             pattern = "*.rpg.txt")
RNAseq_file_paths <- file.path("download/E-GEOD-64056.processed.1/", RNAseq_files)
RNAseq_samples <- lapply(RNAseq_file_paths,
                         function(x) readr::read_tsv(x, col_names = FALSE))
RNAseq_data <-  plyr::join_all(RNAseq_samples, by = "X1")
colnames(RNAseq_data) <- c("geneID", RNAseq_files)
DT::datatable(RNAseq_data)

The gene IDs in this dataset are in the format of "PA14_XXXXX,symbol". We need to clean them to only contain "PA14_XXXXX".

RNAseq_data$geneID <- sapply(RNAseq_data$geneID,
                             function(x) unlist(strsplit(x, ","))[1])

Now the RNAseq_data is in the right format that can be processed by the load_dataset function. This processing step will take a while, because it needs to first map PA14 gene IDs to PAO1 gene IDs. Then it needs to impute the expression of missing genes. Finally it normalizes RNAseq expression values to comparable ranges with microarray expression values using TDM.

data_raw <- load_dataset(input = RNAseq_data, isProcessed = TRUE,
                         isRNAseq = TRUE, model = model,
                         compendium = compendium, quantile_ref = probe_dist,
                         norm01 = FALSE)

ADAGE only accepts expression values in the (0,1) range. We linearly transform expression values to be between 0 and 1 using the Pa compendium as the reference.

data_normed <- zeroone_norm(input_data = data_raw, use_ref = TRUE,
                            ref_data = compendium)

Now let's specify the phenotypes for each sample. It needs to be a character vector and has the same sample order as the expression data loaded above.

data_pheno <- c("phoB", "phoB", "tctD", "tctD", "wt", "wt")

Now this RNAseq dataset is ready for ADAGE signature analysis. Please refer to this vignette on how to perform ADAGE signature analysis.

greenelab/ADAGEpath documentation built on May 25, 2022, 7:11 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

greenelab/ADAGEpath
ADAGE-based Signature Analysis

In greenelab/ADAGEpath: ADAGE-based Signature Analysis

Data preparation

R Package Documentation

Browse R Packages

We want your feedback!

greenelab/ADAGEpath ADAGE-based Signature Analysis

In greenelab/ADAGEpath: ADAGE-based Signature Analysis

Data preparation

R Package Documentation

Browse R Packages

We want your feedback!

greenelab/ADAGEpath
ADAGE-based Signature Analysis