library(knitr)
library(Cairo)
opts_chunk$set(warning=FALSE, fig.width=6, fig.height=6, dev="CairoPNG", stop=TRUE)

Normalize an EPIC v2 dataset

Download example data set


path <- download.epic2.demo.dataset()

Normalize dataset

Create samplesheet

library(meffil)
samplesheet <- meffil.read.samplesheet(base=path, pattern="SampleSheet")

Parameters.

qc.file <- "epic2-demo/qc-report.html"
author <- "Illumina, et al."
study <- "EPIC demo dataset"
number.pcs <- 2
norm.file <- "epic2-demo/normalization-report.html"

Generate QC objects.

qc.objects <- meffil.qc(samplesheet, cell.type.reference="blood gse35069 complete", verbose=T)
## If you get an error relating to a function from
## the 'preprocessCore' R package, 
## then you may need to reinstall it as follows: 
## BiocManager::install("preprocessCore", configure.args="--disable-threading", force = TRUE)

QC report.

qc.summary <- meffil.qc.summary(qc.objects, verbose=T)
meffil.qc.report(
    qc.summary,
    output.file=qc.file,
    author=author,
    study=study)

Normalization dataset.

norm.objects <- meffil.normalize.quantiles(
    qc.objects,
    number.pcs=number.pcs,
    verbose=T)
beta.meffil <- meffil.normalize.samples(
    norm.objects,
    just.beta=T, 
    cpglist.remove=qc.summary$bad.cpgs$name,
    verbose=T)

Compute principal components of the normalized methylation matrix.

pcs <- meffil.methylation.pcs(
    beta.meffil,
    sites=meffil.get.autosomal.sites("epic"),
    verbose=T)

Normalization report.

parameters <- meffil.normalization.parameters(norm.objects)
parameters$batch.threshold <- 0.01
norm.summary <- meffil.normalization.summary(
    norm.objects=norm.objects,
    pcs=pcs,
    parameters=parameters,
    verbose=T)
meffil.normalization.report(
    norm.summary,
    output.file=norm.file,
    author=author,
    study=study)

Horvath's clock and the EPIC v2 microarray

#require(RCurl)
#clock <- read.csv(textConnection(getURL("https://labs.genetics.ucla.edu/horvath/dnamage/AdditionalFile3.csv")), comment.char="#", stringsAsFactors=F)
clock <- read.csv("AdditionalFile3.csv", comment.char="#", stringsAsFactors=F)
length(setdiff(clock$CpGmarker, rownames(beta.meffil)))
nrow(clock)
length(intersect(clock$CpGmarker, rownames(beta.meffil)))/nrow(clock)

Some of the 'clock' sites are missing.

Hannum predictor CpG sites and the EPIC microarray

71 CpG sites were used in the Hannum et al. age predictor:

Hannum G, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell. 2013 Jan 24;49(2):359-67.

The list can be obtained from here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3780611/bin/NIHMS418935-supplement-02.xlsx

hannum.sites <- readLines("hannum.txt")
length(hannum.sites)
length(setdiff(hannum.sites, rownames(beta.meffil)))
length(intersect(hannum.sites, rownames(beta.meffil)))/length(hannum.sites)

Some but not all of the sites are missing.

Duplicated probes

Several thousand probes appear in duplicate across the microarray. By default, the meffil.normalize.samples function collapses duplicated probes by taking their median. It is possible to a different summarizing function by setting the dup.fun argument. It is also possible to keep the duplicate probes by setting dup.fun to NULL as follows:

beta.dup <- meffil.normalize.samples(
    norm.objects,
    just.beta=T, 
    cpglist.remove=qc.summary$bad.cpgs$name,
    dup.fun=NULL,
    verbose=T)

We can then collapse the duplicate probes as follows:

beta.nodup <- meffil.collapse.dups(beta.dup)

This function also has a dup.fun argument allowing the user to specify a different summarizing function.

For example, cg06373096 appears r length(grep("cg06373096",rownames(beta.dup))) times.

quantile(beta.nodup["cg06373096",]-beta.meffil["cg06373096",],na.rm=T)
quantile(colMedians(beta.dup[grepl("cg06373096",rownames(beta.dup)),],na.rm=T)-beta.meffil["cg06373096",],na.rm=T)

Here are the correlations between these duplicates:

cor(t(beta.dup[grep("cg06373096",rownames(beta.dup)),]))
colMeans(cor(t(beta.dup[grep("cg06373096",rownames(beta.dup)),])))

How do matrices differ when we collapse duplicates (beta.meffil and beta.dup) or not (beta.dup)?

identical(rownames(beta.nodup), rownames(beta.meffil))
all(rownames(beta.nodup) %in% sub("_.*", "", rownames(beta.dup)))
all(sub("_.*", "", rownames(beta.dup)) %in% rownames(beta.nodup))
dim(beta.meffil)
dim(beta.nodup)
dim(beta.dup)


perishky/meffil documentation built on March 20, 2024, 1:56 a.m.