README.md

Epigene lite

R-CMD-check

Description

Simplified interactive extension of part of the exploratory analysis pipeline component of EpigenCentral.

Analyze Illumina DNA methylation arrays for differentially methylated CpGs with minfi F-tests and create interactive principal component analysis plots.

This package was developed using

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Bioconductor version 3.14

Installation

devtools::install_github("kevinlul/EpigeneLite", build_vignettes = TRUE)
library(EpigeneLite)

To run the Shiny app:

runEpigeneLite()

Overview

ls("package:EpigeneLite")
data(package = "EpigeneLite")

Epigene lite contains four functions for rudimentary epigenetic analysis. This is a simplified version of a small part of the EpigenCentral pipeline, differentiated by its usability as a free-standing package with exported functions and interactive PCA plots in a Shiny app.

read_idat loads Illumina 450k IDAT files specified by a CSV sample sheet into a minfi GenomicRatioSet for use with this package.

read_geo_tsv loads GEO Series Matrix files that can be downloaded from the NCBI’s public website into a minfi GenomicRatioSet for use with this package, and optionally annotates it with a CSV sample sheet.

pca_plot generates PCA plots from the methylation data, and can focus on specific CpGs or all that are available.

f_test uses minfi’s dmpFinder to perform an F-test and find differentially methylated CpGs between cases and controls specified in the sample sheet when loading a dataset.

browseVignettes("EpigeneLite")

Overview poster

The package tree structure, including the example dataset too large for GitHub:

+-- EpigeneLite.Rproj
+-- DESCRIPTION
+-- NAMESPACE
+-- README.Rmd
+-- README.md
+-- LICENSE
+-- LICENSE.md
+-- R
|   +-- dmp.R
|   +-- pca.R
|   +-- read.R
|   \-- run.R
+-- man
|   +-- f_test.Rd
|   +-- pca_plot.Rd
|   +-- read_geo_tsv.Rd
|   +-- read_idat.Rd
|   \-- runEpigeneLite.Rd
+-- tests
|   +-- testthat
|       +-- test-read.R
|   \-- testthat.R
+-- vignettes
|  \-- GSE55491.Rmd
+-- inst
    +-- CITATION
    +-- extdata
    |   \-- GSE55491
    |       +-- GSE55491_series_matrix.txt
    |       +-- GSM1338100_6057825094_R01C01_Grn.idat
    |       +-- GSM1338100_6057825094_R01C01_Red.idat
    |       +-- GSM1338101_6057825094_R01C02_Grn.idat
    |       +-- GSM1338101_6057825094_R01C02_Red.idat
    |       +-- GSM1338102_6057825094_R02C01_Grn.idat
    |       +-- GSM1338102_6057825094_R02C01_Red.idat
    |       +-- GSM1338103_6057825094_R02C02_Grn.idat
    |       +-- GSM1338103_6057825094_R02C02_Red.idat
    |       +-- GSM1338104_6057825094_R03C01_Grn.idat
    |       +-- GSM1338104_6057825094_R03C01_Red.idat
    |       +-- GSM1338105_6057825094_R03C02_Grn.idat
    |       +-- GSM1338105_6057825094_R03C02_Red.idat
    |       +-- GSM1338106_6057825094_R04C01_Grn.idat
    |       +-- GSM1338106_6057825094_R04C01_Red.idat
    |       +-- GSM1338107_6057825094_R04C02_Grn.idat
    |       +-- GSM1338107_6057825094_R04C02_Red.idat
    |       +-- GSM1338108_6057825094_R05C01_Grn.idat
    |       +-- GSM1338108_6057825094_R05C01_Red.idat
    |       +-- GSM1338109_6057825094_R05C02_Grn.idat
    |       +-- GSM1338109_6057825094_R05C02_Red.idat
    |       +-- GSM1338110_6057825094_R06C01_Grn.idat
    |       +-- GSM1338110_6057825094_R06C01_Red.idat
    |       +-- GSM1338111_6057825094_R06C02_Grn.idat
    |       +-- GSM1338111_6057825094_R06C02_Red.idat
    |       +-- GSM1338112_6057825116_R01C01_Grn.idat
    |       +-- GSM1338112_6057825116_R01C01_Red.idat
    |       +-- GSM1338113_6057825116_R01C02_Grn.idat
    |       +-- GSM1338113_6057825116_R01C02_Red.idat
    |       +-- GSM1338114_6057825116_R02C01_Grn.idat
    |       +-- GSM1338114_6057825116_R02C01_Red.idat
    |       +-- GSM1338115_6057825116_R02C02_Grn.idat
    |       +-- GSM1338115_6057825116_R02C02_Red.idat
    |       +-- GSM1338116_6057825116_R03C01_Grn.idat
    |       +-- GSM1338116_6057825116_R03C01_Red.idat
    |       +-- GSM1338117_6057825116_R03C02_Grn.idat
    |       +-- GSM1338117_6057825116_R03C02_Red.idat
    |       +-- GSM1338118_6057825116_R04C01_Grn.idat
    |       +-- GSM1338118_6057825116_R04C01_Red.idat
    |       +-- GSM1338119_6057825116_R04C02_Grn.idat
    |       +-- GSM1338119_6057825116_R04C02_Red.idat
    |       +-- GSM1338120_6057825116_R05C01_Grn.idat
    |       +-- GSM1338120_6057825116_R05C01_Red.idat
    |       +-- GSM1338121_6057825116_R05C02_Grn.idat
    |       +-- GSM1338121_6057825116_R05C02_Red.idat
    |       +-- GSM1338122_6057825116_R06C01_Grn.idat
    |       +-- GSM1338122_6057825116_R06C01_Red.idat
    |       +-- GSM1338123_6057825116_R06C02_Grn.idat
    |       +-- GSM1338123_6057825116_R06C02_Red.idat
    |       \-- samplesheet.rss-GSE55491.csv
    \-- shiny-scripts
        \-- app.R

Limitations

Due to time constraints, I was not able to add Mann-Whitney U-tests and therefore also a feature to visualize identified positions on the genome where it agrees with the F-test. This is a future point of expansion, likely with the DMRcate Bioconductor package. Furthermore, the Shiny app can be further fleshed out on the interactivity of the PCA plots, though Shiny scales exceptionally poorly here when it also has to run analyses at this scale.

To run tests correctly, a dataset needs to be downloaded from NCBI GEO. More on this in the test files, but the example dataset in the tree above is too large for GitHub and not efficient to store in Git.

Contributions

This package is authored by Kevin Lu.

Based on existing work by The Centre for Computational Medicine at SickKids in EpigenCentral. In general, all functions in this package were inspired by it and some were rewritten based on complex pipeline scripts to be reusable R package functions.

For read_idat, I use several minfi functions chained together to load and normalize the data. This is inspired by the initial steps of the EpigenCentral pipeline and effectively combines several into one.

For read_geo_tsv, I rewrote a similar feature of EpigenCentral so that it could be used as a standalone function and updated it to the current version of R and other packages.

For pca_plot, I use ggplot2 to generate the graphic, basing it on a similar pipeline step of EpigenCentral but rewriting it to be usable as a standalone function.

For f_test, this is the same methodology described in the EpigenCentral paper and is a simplified version of the algorithm and pipeline step that can stand alone as its own function.

The Shiny app and documentation are entirely my work. The dataset is a suggestion from my lab at SickKids and the sample sheet was from an internal collection of public datasets curated by Andrei Turinsky.

References

Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, Irizarry RA (2014). “Minfi: A flexible and comprehensive Bioconductor package for the analysis of Infinium DNA Methylation microarrays.” Bioinformatics, 30(10), 1363-1369. doi: 10.1093/bioinformatics/btu049 (URL: https://doi.org/10.1093/bioinformatics/btu049).

Collado-Torres L (2021). Automate package and project setup for Bioconductor packages. doi: 10.18129/B9.bioc.biocthis (URL: https://doi.org/10.18129/B9.bioc.biocthis), https://github.com/lcolladotor/biocthisbiocthis - R package version 1.4.0, <URL: http://www.bioconductor.org/packages/biocthis>.

Douglas, Alex; Roos, Deon; Mancini, Francesca; Couto, Ana; Lusseau, David (2021). An Introduction to R. Retrieved from https://intro2r.com/

Du, P., Kibbe, W.A. and Lin, S.M., (2008) ‘lumi: a pipeline for processing Illumina microarray’, Bioinformatics 24(13):1547-1548

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Kasper Daniel Hansen (2016). IlluminaHumanMethylation450kanno.ilmn12.hg19: Annotation for Illumina’s 450k methylation arrays. R package version 0.6.0.

Kasper Daniel Hansen and Martin Aryee (2012). IlluminaHumanMethylation450kmanifest: Annotation for Illumina’s 450k methylation arrays. R package version 0.4.0.

Martin Morgan (2021). BiocManager: Access the Bioconductor Project Package Repository. R package version 1.30.16. https://CRAN.R-project.org/package=BiocManager

Orchestrating high-throughput genomic analysis with Bioconductor. W. Huber, V.J. Carey, R. Gentleman, …, M. Morgan Nature Methods, 2015:12, 115

Prickett AR, Ishida M, Böhm S, Frost JM et al. Genome-wide methylation analysis in Silver-Russell syndrome patients. Hum Genet 2015 Mar;134(3):317-332. PMID: 25563730 URL: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55491

R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

RStudio Inc. (2021). Shiny - Tutorial. URL: https://shiny.rstudio.com/tutorial/

Turinsky AL, Choufani S, Lu K, Liu D, Mashouri P, Min D, Weksberg R, Brudno M. EpigenCentral: Portal for DNA methylation data analysis and classification in rare diseases. Hum Mutat. 2020 Jul 5. doi: 10.1002/humu.24076. Epub ahead of print. PMID: 32623772.

Wickham, Hadley and Bryan, Jenny (2021). R Packages: organize, test, document and share your code. Retrieved from https://r-pkgs.org/index.html

Acknowledgements

This package was developed as part of an assessment for 2021 BCB410H: Applied Bioinformatics, University of Toronto, Toronto, CANADA.

Licence

Copyright © 2021 Kevin Lu. Available under the GNU AGPLv3 or later; see LICENSE.md for more details.



kevinlul/EpigeneLite documentation built on Dec. 21, 2021, 6:35 a.m.