Bedgraph files generated by BS pipelines often come in various flavors. Critical downstream step requires aggregation of these files into methylation/coverage matrices. This step of data aggregation is done by
Methrix, including many other useful downstream functions.
For a short and quick documentation, see the Bioconductor vignette
A exemplary complete data analysis with steps from reading in to annotation and differential methylation calling can be find in our WGBS best practices workflow.
Mayakonda A, Schönung M, Hey J, Batra RN, Feuerstein-Akgoz C, Köhler K, Lipka DB, Sotillo R, Plass C, Lutsik P, Toth R. Methrix: an R/bioconductor package for systematic aggregation and analysis of bisulfite sequencing data. Bioinformatics. 2020 Dec 21:btaa1048. doi: 10.1093/bioinformatics/btaa1048. Epub ahead of print. PMID: 33346800.
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") #Installing stable version from BioConductor BiocManager::install("methrix") #Installing developmental version from GitHub BiocManager::install("CompEpigen/methrix")
Ideally one should use the newest versions of R and BioC versions. In case of the older versions (for e.g, R < 4.0), installing from BioConductor might lead to installing an older version of the package. In that case installing from GitHub might be easier since it is more merciful with regards to versions.
SummarizedExperimentwith custom methods for CpG extraction, sub-setting, and filtering
Usage is simple and involves generating a methrix object using read_bedgraphs() command which can then be passed to all downstream analyses.
Below is the two step procedure to import WGBS bedGraph files.
> hg19_cpgs = methrix::extract_CPGs(ref_genome = "BSgenome.Hsapiens.UCSC.hg19") -Extracting CpGs -Done. Extracted 29,891,155 CpGs from 298 contigs. There were 50 or more warnings (use warnings() to see the first 50)
The example data of the methrix package is used.
#Example bedgraph files > bdg_files = list.files(path = system.file('extdata', package = 'methrix'), pattern = "*bdg\\.gz$", full.names = TRUE) > meth = methrix::read_bedgraphs(files = bdg_files, ref_cpgs = hg19_cpgs, chr_idx = 1, start_idx = 2, M_idx = 3, U_idx = 4, stranded = TRUE, collapse_strands = TRUE) ---------------------------- -Preset: Custom --Missing beta and coverage info. Estimating them from M and U values -CpGs raw: 29,891,155 (total reference CpGs) -CpGs retained: 28,217,448(reference CpGs from contigs of interest) -CpGs stranded: 56,434,896(reference CpGs from both strands) ---------------------------- -Processing: C1.bedGraph.gz --CpGs missing: 56,434,219 (from known reference CpGs) -Processing: C2.bedGraph.gz --CpGs missing: 56,434,207 (from known reference CpGs) -Processing: N1.bedGraph.gz --CpGs missing: 56,434,194 (from known reference CpGs) -Processing: N2.bedGraph.gz --CpGs missing: 56,434,195 (from known reference CpGs) -Finished in: 00:02:00 elapsed (00:02:23 cpu) > meth An object of class methrix n_CpGs: 28,217,448 n_samples: 4 is_h5: FALSE Reference: hg19
What can be done on
methrix object? Following are the key functions
#reading and writing: read_bedgraphs() #Reads in bedgraph files into methrix write_bedgraphs() #Writes bedGraphs from methrix object write_bigwigs() #Writes bigWigs from methrix object #operations order_by_sd() #Orders methrix object by SD region_filter() #Filters matrices by region mask_methrix() #Masks lowly covered CpGs coverage_filter() #Filters methrix object based on coverage subset_methrix() #Subsets methrix object based on given conditions. remove_uncovered() #Removes loci that are uncovered across all samples remove_snps() #Removes loci overlapping with possible SNPs #Visualization and QC methrix_report() #Creates a detailed interative html summary report from methrix object methrix_pca() #Principal Component Analysis plot_pca() #Plots the result of PCA plot_coverage() #Plots coverage statistics plot_density() #Plots the density distribution of the beta values plot_violin() #Plots the distribution of the beta values on a violin plot plot_stats() #Plot descriptive statistics get_stats() #Estimate descriptive statistics of the object #Other methrix2bsseq() #Convert methrix to bsseq object
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.