README.md
In NeilPearson-Lilly/MethyLiution: MethyLiution: A QC Pipeline For Methylation Data

In order to facilitate meaningful comparisons across the quickly growing number of microarray-based methylation experiments, we have developed an ordered set of procedures which we deploy as a pipeline for the rapid, consistent and user-friendly evaluation of data quality for common formats of these experiments. This pipeline also supports the capability to detect and implement corrective adjustments for notable common inconsistencies arising from experimental and human error.

Data should be prepared for analysis by simply moving all related .IDAT files into a single directory.

In addition, a sample sheet/metadata file should be supplied. This must contain at least 4 columns, in the folowing order:

Assay ID (derived from standard IDAT filename, minus colour channel label)
Sample ID (an arbitrary ID assigned by the experimenter)
Experimental group
Sex (either F, M, or U if unknown/unavailable)

The pipeline is divided into a series of 10 steps, which are run in order:

Parse data
Filter SNP probes and imprinting genes
Check that metadata for samples matches inferences
Choose the best to carry forwards to subsequent analyses
Detect (but do not remove) outliers
Colour balancing for 2-colour microarrays
Apply inter-assay normalisation
Apply Beta-Mixture Quantile normalisation
Remove assays with missing metadata
Apply surrogate variable analysis to automatically find hidden variables

In most cases, we recommend to simply run the analysis procedure in one pass, like so:

input_dir = "C:/path/to/IDAT_files"
sample_sheet = paste(input_dir, "meta.csv", sep = "/")
output_dir = "C:/path/for/output_files"
runPipeline(datadir = input_dir, 
            array = "450K",
            metafile = sample_sheet, 
            outdir = output_dir)

By default, we do not run step 10 (surrogate variable analysis) unless explicitly requested, on the grounds that this step is likely to require a more flexible and context-specific analysis than can be easily written into a pipeline function.

The list of valid probes (for both EPIC and 450K arrays) required for step 2 is included as an accessible dataset, probe_data. This dataset was derived from the list of probes available in the BioConductor packages IlluminaHumanMethylationEPICanno.ilm10b2.hg19 and IlluminaHumanMethylation450kprobe, with filtering applied to remove probes located within 10bp of a SNP in order to avoid misleading results arising from polymorphism-induced hybridisation issues.

The GEO series GSE86831, an EPIC microarray dataset, has been used to produce the plots and outputs presented in the vignette.

NeilPearson-Lilly/MethyLiution documentation built on May 21, 2019, 11:29 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NeilPearson-Lilly/MethyLiution
MethyLiution: A QC Pipeline For Methylation Data

README.md
In NeilPearson-Lilly/MethyLiution: MethyLiution: A QC Pipeline For Methylation Data

MethyLiution

Required data

Typical operation

Included datasets

R Package Documentation

Browse R Packages

We want your feedback!

NeilPearson-Lilly/MethyLiution MethyLiution: A QC Pipeline For Methylation Data

README.md In NeilPearson-Lilly/MethyLiution: MethyLiution: A QC Pipeline For Methylation Data

MethyLiution

Required data

Typical operation

Included datasets

R Package Documentation

Browse R Packages

We want your feedback!

NeilPearson-Lilly/MethyLiution
MethyLiution: A QC Pipeline For Methylation Data

README.md
In NeilPearson-Lilly/MethyLiution: MethyLiution: A QC Pipeline For Methylation Data