estimateCellCountsWithError: Estimate cellular composition and associated error
In ds420/CETYGO: Quantifying the accuracy of cellular deconvolution

View source: R/estimateCellCountsWithError.R

estimateCellCountsWithError

R Documentation

Estimate cellular composition and associated error

Description

This is an adaptation of estimateCellCounts() from the minfi R package to include the calculation of CETYGO alongside cellular deconvolution. The arguments are as described in the minfi package. The estimation of cellular composition is an implementaion of the Houseman et al (2012) regression calibration approach algorithm to the Illumina 450k microarray for deconvoluting heterogeneous tissue sources like blood. For example, this function will take an RGChannelSet from a DNA methylation (DNAm) study of blood, and return the relative proportions of CD4+ and CD8+ T-cells, natural killer cells, monocytes, granulocytes, and b-cells in each sample.

Usage

estimateCellCountsWithError(
  rgSet,
  compositeCellType = "Blood",
  processMethod = "auto",
  probeSelect = "auto",
  cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Gran"),
  referencePlatform = c("IlluminaHumanMethylation450k", "IlluminaHumanMethylationEPIC",
    "IlluminaHumanMethylation27k"),
  returnAll = FALSE,
  meanPlot = FALSE,
  verbose = TRUE,
  ...
)

Arguments

`rgSet`	An input RGChannelSet object with raw DNA methylation data for the samples that require cell composition to be estimated.
`compositeCellType`	Which composite cell type is being deconvoluted. Should be one of "Blood", "CordBlood", or "DLPFC". See details.
`processMethod`	How should the user and reference data be processed together? Default input "auto" will use preprocessQuantile for Blood and DLPFC and preprocessNoob otherwise, in line with the existing literature. Set it to the name of a preprocessing function as a character if you want to override it, like "preprocessFunnorm".
`probeSelect`	How should probes be selected to distinguish cell types? Options include "both", which selects an equal number (50) of probes (with F-stat p-value < 1E-8) with the greatest magnitude of effect from the hyper- and hypo-methylated sides, and "any", which selects the 100 probes (with F-stat p-value < 1E-8) with the greatest magnitude of difference regardless of direction of effect. Default input "auto" will use "any" for cord blood and "both" otherwise, in line with previous versions of this function and/or our recommendations. Please see the references for more details.
`cellTypes`	Which cell types, from the reference object, should be we use for the deconvolution? See details.
`referencePlatform`	The platform for the reference dataset; if the input rgSet belongs to another platform, it will be converted using convertArray.
`returnAll`	Should the composition table and the normalized user supplied data be return?
`meanPlot`	Whether to plots the average DNA methylation across the cell-type discrimating probes within the mixed and sorted samples.
`verbose`	Should the function be verbose?
`...`	Passed to preprocessQuantile

Details

The function currently supports cell composition estimation for blood, cord blood, and the frontal cortex, through compositeCellType values of "Blood", "CordBlood", and "DLPFC", respectively. Packages containing the appropriate reference data should be installed before running the function for the first time ("FlowSorted.Blood.450k", "FlowSorted.DLPFC.450k", "FlowSorted.CordBlood.450k"). Each tissue supports the estimation of different cell types, delimited via the cellTypes argument. For blood, these are "Bcell", "CD4T", "CD8T", "Eos", "Gran", "Mono", "Neu", and "NK" (though the default value for cellTypes is often sufficient). For cord blood, these are "Bcell", "CD4T", "CD8T", "Gran", "Mono", "Neu", and "nRBC". For frontal cortex, these are "NeuN_neg" and "NeuN_pos". See documentation of individual reference packages for more details.

The meanPlot should be used to check for large batch effects in the data, reducing the confidence placed in the composition estimates. This plot depicts the average DNA methylation across the cell-type discrimating probes in both the provided and sorted data. The means from the provided heterogeneous samples should be within the range of the sorted samples. If the sample means fall outside the range of the sorted means, the cell type estimates will inflated to the closest cell type. Note that we quantile normalize the sorted data with the provided data to reduce these batch effects. DNA methylation of test samples,