Example: finding E. coli outstanding genomic zones in response to nickel stress

  collapse = TRUE,
  comment = "#>"

We illustrate the GenomicOZone package with an example. The Escherichia coli transcriptome data set [@gault2016data] contains transcriptome profiles of E. coli K-12 under nickel stress, with three samples under exposure to nickel and three normal samples.

To run this example, two other packages GEOquery [@davis2007geoquery] for data download and readxl for Excel file reading are required and should have been installed, in addition to GenomicOZone.


The following code chunk downloads a data transcriptome set, prepares genome annotation, and creates parameters before outstanding zone analysis. Specifically, an Escherichia coli transcriptome data set of series number GSE76167 is downloaded from GEO database [@edgar2002gene]. We extracted the RPKM values from six samples including three stressed samples and three wild-type samples. The genome annotation is prepared as a GRanges [@lawrence2013software] object.

# Obtain data matrix from GSE76167 supplementary file
data <- read_excel("./GSE76167/GSE76167_GeneFPKM_AllSamples.xlsx")

# Adjust the input data
data.info <- data[,1:5]
data <- data[,-c(1:5)]
data <- data[,substr(colnames(data), 1, 4) == "FPKM"]
data <- data.matrix(data[,c(1,5,6,3,4,2)])
colnames(data) <- c(paste(rep("WT",3), "_", c(1,2,3), sep=""), 
                    paste(rep("Ni",3), "_", c(1,2,3), sep=""))
rownames(data) <- data.info$tracking_id

# Obtain genes
data.genes <- data.info$gene_short_name
data.genes[data.genes == "-"] <- data.info$tracking_id[data.genes == "-"]

# Create colData
colData <- data.frame(Sample_name = colnames(data),
                      Condition = factor(rep(c("WT", "Ni"), each = 3), 
                                         levels = c("WT", "Ni")))

# Create design
design <- ~ Condition

# Create rowData.GRanges
pattern <- "(.[^\\:]*)\\:([0-9]+)\\-([0-9]+)"
matched <- regexec(pattern, as.character(data.info$locus))
values <- regmatches(as.character(data.info$locus), matched)
data.gene.coor <-  data.frame(chr = as.character(sapply(values, function(x){x[[2]]})),
                              start = as.numeric(sapply(values, function(x){x[[3]]})),
                              end = as.numeric(sapply(values, function(x){x[[4]]})))
rownames(data.gene.coor) <- as.character(data.info$tracking_id)
rowData.GRanges <- GRanges(seqnames = data.gene.coor$chr,
                           IRanges(start = data.gene.coor$start, 
                                   end = data.gene.coor$end),
                           Gene.name = data.genes)
names(rowData.GRanges) <- data.info$tracking_id

chr.size <- 4646332
names(chr.size) <- "NC_007779"
seqlevels(rowData.GRanges) <- names(chr.size)
seqlengths(rowData.GRanges) <- chr.size

With the formatted data, parameters, and annotation, we run the outstanding zone analysis as below:

# Create an input object also checking for data format, consistency, and completeness
GOZ.ds <- GOZDataSet(data = data,
                     colData = colData,
                     design = design,
                     rowData.GRanges = rowData.GRanges)

# Run the outstanding zone analysis
GOZ.ds <- GenomicOZone(GOZ.ds)

The following four auxiliary functions extract gene annotation, zone annotation, outstanding zone annotation, and zone expression matrix, respectively:

# Extract gene/zone GRanges object
Gene.GRanges <- extract_genes(GOZ.ds)

Zone.GRanges <- extract_zones(GOZ.ds)

# min.effect.size = 0.36 is chosen from the 
#     minumum of top 5% effect size values
OZone.GRanges <- extract_outstanding_zones(
                              alpha = 0.05, 
                              min.effect.size = 0.36)

Zone.exp.mat <- extract_zone_expression(GOZ.ds)

Three types of plot can be generated from the returned object by GenomicOZone(), including genome-wide overview, within-chromosome heatmap, and within-zone expression. The plots are generated using R package ggplot2 [@Wickham2016ggplot2] and ggbio [@yin2012ggbio]. The value of min.effect.size = 0.36 is chosen from the minumum of top 5% effect size values. The effect size for ANOVA is calculated by R package sjstats [@sjstats2019].

# Genome-wide overview
plot_genome(GOZ.ds, plot.file = "E_coli_genome.pdf",
            plot.width = 15, plot.height = 4,
            alpha = 0.05, min.effect.size = 0.36)
# Within-chromosome heatmap
plot_chromosomes(GOZ.ds, plot.file = "E_coli_chromosome.pdf",
                 plot.width = 20, plot.height = 4,
                 alpha = 0.05, min.effect.size = 0.36)
# Within-zone expression
plot_zones(GOZ.ds, plot.file = "E_coli_zone.pdf", 
           plot.all.zones = FALSE,
           alpha = 0.05, min.effect.size = 0.36)


Try the GenomicOZone package in your browser

Any scripts or data that you put into this service are public.

GenomicOZone documentation built on Nov. 8, 2020, 6:01 p.m.