Introduction

The r Biocpkg("wateRmelon") provides a set of tools for importing, quality control and normalizing Illumina DNA methylation microarray data. All of the functions described in this vignette are fully compatible with the r Biocpkg("minfi") package, including any workflows or pipelines that involves it. Additionally wateRmelon extends the r Biocpkg("methylumi") package to allow for MethyLumiSet representations of EPIC array data. If you are using many hundreds of arrays, you could consider using the related r Biocpkg("wateRmelon").

In addition to our own functions we provide implementations for a variety of age-prediction and cell-type estimations which work for both 450k and EPIC data.

Citing the wateRmelon R package

Installation

wateRmelon is designed to extend the r Biocpkg("methylumi") and r Biocpkg("minfi") R packages so there is quite the number of dependencies which need to be installed. This can be handled by simply running:

install.packages('BiocManager')
BiocManager::install('wateRmelon')
library(wateRmelon)
library(wateRmelon)
data(melon)

Alternatively, wateRmelon can be installed directly from our github

install.packages('devtools')
devtools::install_github('schalkwyk/wateRmelon')

Quick Start

The package contains a small subset of 450k array data that can be used to explore all the funtions quickly. The melon data set is a MethyLumiSet containing 12 samples (columns) and 3364 features (rows)

data(melon)
dim(melon)
# Quality filter using default thresholds
melon.pf <- pfilter(melon)

# Normalize using one of the many available methods
melon.dasen.pf <- dasen(melon.pf)

# Extract Betas for downstream analysis
norm_betas <- betas(melon.dasen.pf)

Data Import

The IDAT reading provided in r Biocpkg("methylumi") has been updated to handle EPIC arrays, this is provided within the readEPIC function. Ideally, you can then read in a phenotype information that correspond to the idat files you have loaded in.

Alternatively you can read in data using r Biocpkg("minfi") and work from an RGChannelSet or MethylSet object, using the read_metharray functions.

mlumi <- readEPIC('path/to/idats')

Quality Control

Quality control is a vital part of performing an EWAS, it is important that we are able to easily identify samples that are outliers and could lead to false positive results. There are many ways to identify outliers. Our preferred method is to use the outlyx function which provides a robust and reproducible multivariate method that we feel works well for most data-types.

Outlier Detection

outliers <- outlyx(melon, plot=TRUE)
print(outliers)
# remove outliers with melon[,!outliers$out]

Similarly it is important to check the quality control probes to measure how well the experiment performs. For this we provide bscon to quickly check the bisulfite conversion probes for each sample to estimate the percentage of DNA that has been successfully converted. Values above 90% are ideally but values above 80% are also acceptable.

bsc <- bscon(melon)
hist(bsc, xlab = c(0, 100))

Probe Filtering

Poorly performing probes can be identified from the bead counts and detection p-values using the pfilter function

melon.pf <- pfilter(melon)

Phenotype Predition

wateRmelon now provides functions for three popular methods for generate phenotypic variables. Whether these are needed for quality control or identifying sample mismatch. Or could even been provided as a missing covariate for analyses

Epigenetic Ages

wateRmelon provides 5 epigenetic age predictors: Horvath, Horvath Skin & Blood, Hannum, PhenoAge and Lin! As an extra measure we provide the number of missing probes for each clock and this can be used to supply your own list of coefficients to calculate different linear age predictions. !!However!! any normalisation methods that the original age predictors use to predict ages so there will be slight differences compared to the original methods.

agep(melon, method='all')

agep(melon, method='horvath')

Sex

Sexes of samples can be determined like so:

estimateSex(betas(melon), do_plot=TRUE)

Cell-type proportions

wateRmelon is able to compute cell counts for bulk blood (currently only bulk blood).

estimateCellCounts.wateRmelon(melon, referencePlatform = "IlluminaHumanMethylation450k")
estimateCellCounts.wateRmelon(melon, referencePlatform = "IlluminaHumanMethylationEPIC") # change reference

Normalization & Preprocessing

Norm Method | Short Description | Extra Details ------- | ----- | -------- dasen | Best performing normalisation method according to Pidsley 2013 | nasen | Simpler implemenation of dasen that does not correct for sentrix positions | Can be used on EPIC data without any consequences. adjusted_dasen | dasen with our interpolatedXY method | offset_fit = FALSE performs adjusted_nasen instead adjusted_funnorm |funnorm with our interpolatedXY method | Only Available to RGChannelSet SWAN | Popular method for normalisation|

As such they can be used as such:

dasen.melon <- dasen(melon) # Use whichever method you would like to use. 

Normalization Violence

Normalization aims to align the data across samples to make it ready for analysis. The degree of normalization across samples is variable, where samples undergo a more dramatic transformation to resemble the rest of the data. This can be an indicator that a sample has addition-al noise or is an outlier. The qual function provides an estimate of vio-lence that has been introduced through preprocessing and normalization.

das <- dasen(melon)
qu <- qual(betas(melon), betas(das))
plot(qu[,1], qu[,2])

Performance Metrics

TODO...

Genomic Imprinting

dmrse_row(melon.pf)
dmrse_row(melon.dasen.pf) # Slightly better standard errores

SNP Genotypes

Not available for minfi objects

genki(melon.pf)
genki(melon.dasen.pf)

XCI

seabi(melon.pf, sex=pData(melon.pf)$sex, X=fData(melon.pf)$CHR=='X')
seabi(melon.dasen.pf, sex=pData(melon.dasen.pf)$sex, X=fData(melon.dasen.pf)$CHR=='X')

Statistical Analysis

Although we cannot predict what type of analysis you are expecting to perform we have a couple of recommendations that should be considered before you perform statistical testing. Firstly we recommend a final sweep of the normalized data using pwod

bet <- betas(melon)
pwod_bet <- pwod(bet)

# Statistical Analysis using pwod_bet

Session info {.unnumbered}

sessionInfo()


schalkwyk/wateRmelon documentation built on April 15, 2024, 12:06 p.m.