Estimate cellular composition from blood DNA methylation data

knitr::opts_chunk$set(echo = TRUE)

This vignette demonstrates how to use the provided reference DNA methyation data for purified blood cell types to estimate cellular proportions. The reference data were generated with the Illumina Infinium MethylationEPIC BeadChip and includes CD4 T cells, CD8 T cells, B cells, Monocytes and Granulocytes. The output is a matrix with one column per cell type and one row per sample where the values represent the estimated cellular proportion for that sample and that cell type.The estimates are calculated using the Houseman reference based algorithm, for details of the mathematics behind this algorithm, I refer you to the original manuscript. While the reference data were generated with Illumina Infinium MethylationEPIC BeadChip, the function will accept data generated with the Illumina Infinium Methylation450K BeadChip array.

Install the package

It is recommended that the package is installed from Bioconductor. It will then need to be loaded.

library(EPIC.DNAm.FACS.sorted.BloodCellTypes)

Set up your R environment

This section of code will load the relevant R libraries. Note it assumes these are already installed. If they are not, this will need to be done prior to proceeding with this vignette.

library(minfi)
library(genefilter)
library(quadprog)
library(IlluminaHumanMethylationEPICmanifest)
library(IlluminaHumanMethylation450kmanifest) # only if your data were profiled with the 450K array.

Estimate cellular composition

To estimate cellular composition you need to load your data in from idats files to an RGChannelSet object. This is because your supplied data will be normalised with the reference data and therefore all the raw data is required. This code will do this.

samplesToLoad<- ## list of samples typically SentrixID_SentrixPosition for which you have two idat files
setwd() ## change to directory where idat files are located
RGSet <- read.metharray(samplesToLoad) ## load from idat files

To estimate cellular composition for these samples run:

## to estimate cellular proportions for data generated with the EPIC array
counts<-estimateCellCountsEPIC(RGSet, EPIC=TRUE, cellTypes=c("Bcells", "CD4Tcells", "CD8Tcells", "Monocytes", "Granulocytes"))

You can adjust the 'cellTypes' argument to select a subset of cell types to estimate, for example if you have PBMCs and don't want to estimate the proportion of granulocytes.

By default the package assumes that you are going to supply DNA methylation data profiled with the EPIC array, if you have data from the 450K array change the 'EPIC' argument to FALSE as shown here:

## to estimate cellular proportions for data generated with the 450K array
counts<-estimateCellCountsEPIC(RGSet, EPIC=FALSE, cellTypes=c("Bcells", "CD4Tcells", "CD8Tcells", "Monocytes", "Granulocytes"))


ejh243/EPICBloodCompCalc documentation built on May 24, 2019, 2:48 p.m.