Home

/

GitHub

/

vjcitn/edx_adv_bioc

/

In vjcitn/edx_adv_bioc: course development, edx advanced bioconductor

Week 1: Visualization for genomic data science

Related resources: Griffith lab course

Overview: base graphics, ggplot2, and graphic file formats

Displaying structure in multivariate data

Setting up the airway dataset

library(airway)
data(gse)
gse # uses gencode gene names
rownames(gse) = gsub("\\..*", "", rownames(gse)) # ENS
library(org.Hs.eg.db)
ss = select(org.Hs.eg.db, keys=rownames(gse), 
   keytype="ENSEMBL", columns=c("GENENAME", "SYMBOL"))
dr = which(duplicated(ss[,1]) | is.na(ss[,1]))
ss = ss[-dr,]
rownames(ss) = ss[,1]
ok = intersect(rownames(ss), rownames(gse))
gse = gse[ok,]
rownames(gse) = ss[rownames(gse),]$SYMBOL
gse = gse[-which(is.na(rownames(gse))),]

Visualizing one-dimensional data

We will work with a reduction of the information in the experiment, summing columns of the assay matrix. We want to present the information for rapid uptake in context. We could just use a table

data.frame(colsum=colSums(assay(gse)), donor=gse$donor, trt=gse$condition)

but the numbers are unwieldy, and patterns are hard to extract from the text.

Questions

What is the interpretation of colSums(assay(gse))?
Execute [when we go live the results of code chunks in problems is suppressed]

plot(colSums(assay(gse)))
text(1:8, colSums(assay(gse)), 
  labels=paste0(gse$donor, "\n", 
             gse$names, "\n", 
             substr(gse$condition,1,3)), cex=.6)

Which of the following is not true?

Sample SRR1039517 had the most reads
Depth of sequencing was fairly uniform across the samples
The largest increase in number of reads with dexamethasone treatment was for cell N080611
The difference in average read counts for cells N052611 and N061011 was less than 120000 reads
Try

plot(colSums(assay(gse)), 
    pch=" ", xlim=c(0,9), axes=FALSE, xlab=" ")
text(colSums(assay(gse)), 
    label=paste0(gse$donor, "\n", gse$condition), 
    col=as.numeric(gse$donor), font=2)
axis(2)

Two of the plotted data elements have defects. What graphical parameters can be used to achieve a better rendering?

Easiest answer: ylim

plot(colSums(assay(gse)), 
    pch=" ", xlim=c(0,9), axes=FALSE, xlab=" ", ylim=c(1.25e7,3.2e7))
text(colSums(assay(gse)), 
    label=paste0(gse$donor, "\n", gse$condition), 
    col=as.numeric(gse$donor), font=2)
axis(2)

Sketching densities

Here we'll work with some RNA-seq data developed in the Cancer Genome Atlas.

suppressMessages({
library(curatedTCGAData)
ex = curatedTCGAData(c("PRAD", "ACC"), assay="RNAseq2GeneNorm", 
  dry.run=FALSE)
})
ex
cs = lapply(experiments(ex), function(x) colSums(assay(x)))

plot(density(cs[[2]]))
lines(density(cs[[1]]), lty=2)
legend(2.3e7, 4e-7, legend=c("type 1", "type 2"))

Question: What should you substitute for 'type 1' in the legend so that the tumor type on which the solid density trace is based appears in the legend? Answer should be either 'PRAD' or 'ACC'.

Question: Which axis records the depth of sequencing for the various experiments? x or y?

Visualizing categorical data

Networks for linked data

suppressPackageStartupMessages({
library(graph)
library(SingleR)
})
mon = MonacoImmuneData()
labs = unique(c(mon$label.main, mon$label.fine))
myg = new("graphNEL", nodes=labs, edgemode="directed")
myg2 = addEdge(mon$label.main, mon$label.fine, myg)
myg2

Given a set of nodes, we want to extract a sliceGraph = function(g, n) { a = lapply(n, function(x) adj(g, x)) nn = unique(c(n, unlist(a))) subGraph(nn, g) } myg3 = sliceGraph(myg2, unique(mon$label.main)[1:4]) library(igraph) ig = igraph.from.graphNEL(myg3) plot(ig, vertex.size=0, label.cex=.05, label.dist=1) ```

Multivariate visualization

Week 2: Interactive apps with Bioconductor and shiny

Week 3: Memory-sparing solutions for large data with Bioconductor

Week 4: Using python resources with Bioconductor

vjcitn/edx_adv_bioc documentation built on March 17, 2020, 3:58 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vjcitn/edx_adv_bioc
course development, edx advanced bioconductor

In vjcitn/edx_adv_bioc: course development, edx advanced bioconductor

Week 1: Visualization for genomic data science

Overview: base graphics, ggplot2, and graphic file formats

Displaying structure in multivariate data

Setting up the airway dataset

Visualizing one-dimensional data

Sketching densities

Visualizing categorical data

Networks for linked data

Multivariate visualization

Week 2: Interactive apps with Bioconductor and shiny

Week 3: Memory-sparing solutions for large data with Bioconductor

Week 4: Using python resources with Bioconductor

R Package Documentation

Browse R Packages

We want your feedback!

vjcitn/edx_adv_bioc course development, edx advanced bioconductor

In vjcitn/edx_adv_bioc: course development, edx advanced bioconductor

Week 1: Visualization for genomic data science

Overview: base graphics, ggplot2, and graphic file formats

Displaying structure in multivariate data

Setting up the airway dataset

Visualizing one-dimensional data

Sketching densities

Visualizing categorical data

Networks for linked data

Multivariate visualization

Week 2: Interactive apps with Bioconductor and shiny

Week 3: Memory-sparing solutions for large data with Bioconductor

Week 4: Using python resources with Bioconductor

R Package Documentation

Browse R Packages

We want your feedback!

vjcitn/edx_adv_bioc
course development, edx advanced bioconductor