knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
We illustrate the GenomicOZone package with an example. The Escherichia coli transcriptome data set [@gault2016data] contains transcriptome profiles of E. coli K-12 under nickel stress, with three samples under exposure to nickel and three normal samples.
To run this example, two other packages GEOquery
[@davis2007geoquery] for data download and readxl
for Excel file reading are required and should have been installed, in addition to GenomicOZone
.
require(GenomicOZone) require(GEOquery) require(readxl)
The following code chunk downloads a data transcriptome set, prepares genome annotation, and creates parameters before outstanding zone analysis. Specifically, an Escherichia coli transcriptome data set of series number GSE76167 is downloaded from GEO database [@edgar2002gene]. We extracted the RPKM values from six samples including three stressed samples and three wild-type samples. The genome annotation is prepared as a GRanges
[@lawrence2013software] object.
# Obtain data matrix from GSE76167 supplementary file invisible(getGEOSuppFiles("GSE76167")) data <- read_excel("./GSE76167/GSE76167_GeneFPKM_AllSamples.xlsx") # Adjust the input data data.info <- data[,1:5] data <- data[,-c(1:5)] data <- data[,substr(colnames(data), 1, 4) == "FPKM"] data <- data.matrix(data[,c(1,5,6,3,4,2)]) colnames(data) <- c(paste(rep("WT",3), "_", c(1,2,3), sep=""), paste(rep("Ni",3), "_", c(1,2,3), sep="")) rownames(data) <- data.info$tracking_id # Obtain genes data.genes <- data.info$gene_short_name data.genes[data.genes == "-"] <- data.info$tracking_id[data.genes == "-"] # Create colData colData <- data.frame(Sample_name = colnames(data), Condition = factor(rep(c("WT", "Ni"), each = 3), levels = c("WT", "Ni"))) # Create design design <- ~ Condition # Create rowData.GRanges pattern <- "(.[^\\:]*)\\:([0-9]+)\\-([0-9]+)" matched <- regexec(pattern, as.character(data.info$locus)) values <- regmatches(as.character(data.info$locus), matched) data.gene.coor <- data.frame(chr = as.character(sapply(values, function(x){x[[2]]})), start = as.numeric(sapply(values, function(x){x[[3]]})), end = as.numeric(sapply(values, function(x){x[[4]]}))) rownames(data.gene.coor) <- as.character(data.info$tracking_id) rowData.GRanges <- GRanges(seqnames = data.gene.coor$chr, IRanges(start = data.gene.coor$start, end = data.gene.coor$end), Gene.name = data.genes) names(rowData.GRanges) <- data.info$tracking_id chr.size <- 4646332 names(chr.size) <- "NC_007779" seqlevels(rowData.GRanges) <- names(chr.size) seqlengths(rowData.GRanges) <- chr.size
With the formatted data, parameters, and annotation, we run the outstanding zone analysis as below:
# Create an input object also checking for data format, consistency, and completeness GOZ.ds <- GOZDataSet(data = data, colData = colData, design = design, rowData.GRanges = rowData.GRanges) # Run the outstanding zone analysis GOZ.ds <- GenomicOZone(GOZ.ds)
The following four auxiliary functions extract gene annotation, zone annotation, outstanding zone annotation, and zone expression matrix, respectively:
# Extract gene/zone GRanges object Gene.GRanges <- extract_genes(GOZ.ds) head(Gene.GRanges) Zone.GRanges <- extract_zones(GOZ.ds) head(Zone.GRanges) # min.effect.size = 0.36 is chosen from the # minumum of top 5% effect size values OZone.GRanges <- extract_outstanding_zones( GOZ.ds, alpha = 0.05, min.effect.size = 0.36) head(OZone.GRanges) Zone.exp.mat <- extract_zone_expression(GOZ.ds) head(Zone.exp.mat)
Three types of plot can be generated from the returned object by GenomicOZone()
, including genome-wide overview, within-chromosome heatmap, and within-zone expression. The plots are generated using R package ggplot2
[@Wickham2016ggplot2] and ggbio
[@yin2012ggbio]. The value of min.effect.size = 0.36
is chosen from the minumum of top 5% effect size values. The effect size for ANOVA is calculated by R package sjstats
[@sjstats2019].
# Genome-wide overview plot_genome(GOZ.ds, plot.file = "E_coli_genome.pdf", plot.width = 15, plot.height = 4, alpha = 0.05, min.effect.size = 0.36) knitr::include_graphics("E_coli_genome.pdf")
# Within-chromosome heatmap plot_chromosomes(GOZ.ds, plot.file = "E_coli_chromosome.pdf", plot.width = 20, plot.height = 4, alpha = 0.05, min.effect.size = 0.36) knitr::include_graphics("E_coli_chromosome.pdf")
# Within-zone expression plot_zones(GOZ.ds, plot.file = "E_coli_zone.pdf", plot.all.zones = FALSE, alpha = 0.05, min.effect.size = 0.36) knitr::include_graphics("E_coli_zone.pdf")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.