library(knitr) library(Cairo) opts_chunk$set(warning=FALSE, fig.width=6, fig.height=6, dev="CairoPNG", stop=TRUE)
path <- download.epic2.demo.dataset()
Create samplesheet
library(meffil) samplesheet <- meffil.read.samplesheet(base=path, pattern="SampleSheet")
Parameters.
qc.file <- "epic2-demo/qc-report.html" author <- "Illumina, et al." study <- "EPIC demo dataset" number.pcs <- 2 norm.file <- "epic2-demo/normalization-report.html"
Generate QC objects.
qc.objects <- meffil.qc(samplesheet, cell.type.reference="blood gse35069 complete", verbose=T) ## If you get an error relating to a function from ## the 'preprocessCore' R package, ## then you may need to reinstall it as follows: ## BiocManager::install("preprocessCore", configure.args="--disable-threading", force = TRUE)
QC report.
qc.summary <- meffil.qc.summary(qc.objects, verbose=T) meffil.qc.report( qc.summary, output.file=qc.file, author=author, study=study)
Normalization dataset.
norm.objects <- meffil.normalize.quantiles( qc.objects, number.pcs=number.pcs, verbose=T) beta.meffil <- meffil.normalize.samples( norm.objects, just.beta=T, cpglist.remove=qc.summary$bad.cpgs$name, verbose=T)
Compute principal components of the normalized methylation matrix.
pcs <- meffil.methylation.pcs( beta.meffil, sites=meffil.get.autosomal.sites("epic"), verbose=T)
Normalization report.
parameters <- meffil.normalization.parameters(norm.objects) parameters$batch.threshold <- 0.01 norm.summary <- meffil.normalization.summary( norm.objects=norm.objects, pcs=pcs, parameters=parameters, verbose=T) meffil.normalization.report( norm.summary, output.file=norm.file, author=author, study=study)
#require(RCurl) #clock <- read.csv(textConnection(getURL("https://labs.genetics.ucla.edu/horvath/dnamage/AdditionalFile3.csv")), comment.char="#", stringsAsFactors=F) clock <- read.csv("AdditionalFile3.csv", comment.char="#", stringsAsFactors=F) length(setdiff(clock$CpGmarker, rownames(beta.meffil))) nrow(clock) length(intersect(clock$CpGmarker, rownames(beta.meffil)))/nrow(clock)
Some of the 'clock' sites are missing.
71 CpG sites were used in the Hannum et al. age predictor:
Hannum G, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell. 2013 Jan 24;49(2):359-67.
The list can be obtained from here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3780611/bin/NIHMS418935-supplement-02.xlsx
hannum.sites <- readLines("hannum.txt") length(hannum.sites) length(setdiff(hannum.sites, rownames(beta.meffil))) length(intersect(hannum.sites, rownames(beta.meffil)))/length(hannum.sites)
Some but not all of the sites are missing.
Several thousand probes appear in duplicate across the microarray.
By default, the meffil.normalize.samples
function collapses
duplicated probes by taking their median.
It is possible to a different summarizing function by setting the
dup.fun
argument.
It is also possible to keep the duplicate probes by
setting dup.fun
to NULL
as follows:
beta.dup <- meffil.normalize.samples( norm.objects, just.beta=T, cpglist.remove=qc.summary$bad.cpgs$name, dup.fun=NULL, verbose=T)
We can then collapse the duplicate probes as follows:
beta.nodup <- meffil.collapse.dups(beta.dup)
This function also has a dup.fun
argument allowing the user
to specify a different summarizing function.
For example, cg06373096 appears
r length(grep("cg06373096",rownames(beta.dup)))
times.
quantile(beta.nodup["cg06373096",]-beta.meffil["cg06373096",],na.rm=T) quantile(colMedians(beta.dup[grepl("cg06373096",rownames(beta.dup)),],na.rm=T)-beta.meffil["cg06373096",],na.rm=T)
Here are the correlations between these duplicates:
cor(t(beta.dup[grep("cg06373096",rownames(beta.dup)),])) colMeans(cor(t(beta.dup[grep("cg06373096",rownames(beta.dup)),])))
How do matrices differ when we collapse
duplicates (beta.meffil
and beta.dup
)
or not (beta.dup
)?
identical(rownames(beta.nodup), rownames(beta.meffil)) all(rownames(beta.nodup) %in% sub("_.*", "", rownames(beta.dup))) all(sub("_.*", "", rownames(beta.dup)) %in% rownames(beta.nodup))
dim(beta.meffil) dim(beta.nodup) dim(beta.dup)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.