These notes were created during the course, and server as a transcript of topics covered.
Workflow
GenomicFeatures::summarizeOverlaps()
)View from the Linux command line...
zcat *fastq.gz | less
samtools view -h *bam
... or within R / Bioconductor: fastq files
library(ShortRead) strm = FastqStreamer("bigdata/SRR1039508_1.fastq.gz", 100000) fq = yield(strm) fq sread(fq) quality(fq)
x = rnorm(1000) y = x + rnorm(1000, sd=.5) df = data.frame(x=x, y=y) plot(y ~ x, df) fit = lm(y ~ x, df) class(fit) methods(class=class(fit)) methods("anova")
Help!
?log ?plot # generic 'plot' ?plot.lm # plot for objects of class 'lm'
Extensive use of 'S4' classes
fit
(from lm()
) is an example of an S3 classsread(fq)
returned a DNAStringSet, an example of an
S4 classlibrary(ShortRead) strm = FastqStreamer("bigdata/SRR1039508_1.fastq.gz", 100000) fq = yield(strm) # 'ShortReadQ' S4 class class(fq) # introspection methods(class=class(fq)) reads = sread(fq) # accessor -- get the reads reads # 'DNAStringSet' S4 class methods(class=class(reads)) gc = letterFrequency(reads, "GC", as.prob=TRUE) hist(gc)
Help!
?DNAStringSet # class, and often frequently used methods ?letterFrequency # generic methods("letterFrequency") ?"letterFrequency,XStringSet-method"
Key software packages...
import()
to import BED, WIG, GFF, GTF,
..., files... and classes
assays()
rowRanges()
for annotations on rowscolData()
for column annotationsAnnotation
org.*
packagesTxDb.*
packagesBSgenome.*
packagesStrategies for working with big data
FastqStreamer()
,
Rsamtools::BamFile(..., yieldSize=1000000)
;
GenomicFiles::reduceByYield()
(see examples on
?reduceByYield
)All material on the course materials page
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.