methcp: User’s Guide

knitr::opts_chunk$set(echo = TRUE, cache = FALSE)

Introduction

methcp is a differentially methylated region (DMR) detecting method for whole-genome bisulfite sequencing (WGBS) data. It is applicable for a wide range of experimental designs. In this document, we provide examples for two-group comparisons and time-course analysis.

methcp identifies DMRs based on change point detection, which naturally segments the genome and provides region-level differential analysis. We direct the interested reader to our paper here

Load packages.

library(bsseq)
library(MethCP)

Two-group comparison

In this section, we use the CpG methylation data from an Arabidopsis dataset available from GEO with accession number GSM954584. We take a subset of chromosome 1 and 2 from each of the samples and perform differential analysis between the wild-type plants and the H2A.Z mutant plants, which we refer to as treatment and control in the rest of the document.

Read data

We use a well-developed Bioconductor package bsseq to load the store the raw data. Below is an example of how to read raw counts using bsseq.

We provide a helper function createBsseqObject to create a bsseq object when the data for each sample is stored in a separate text file.

For more operations regarding the bsseq object, or to create a bsseq object customized to your file format, please refer to their User's Guide.

# The dataset is consist of 6 samples. 3 samples are H2A.Z mutant 
# plants, and 3 samples are controls.
sample_names <- c(
    paste0("control", seq_len(3)), 
    paste0("treatment", seq_len(3))
)

# Get the vector of file path and names 
raw_files <- system.file(
    "extdata", paste0(sample_names, ".txt"), package = "MethCP")

# load the data
bs_object <- createBsseqObject(
    files = raw_files, sample_names = sample_names, 
    chr_col = 'Chr', pos_col = 'Pos', m_col = "M", cov_col = 'Cov')

The bsseq object.

bs_object

Header of the raw file for one of the samples.

dt <- read.table(
            raw_files[1], stringsAsFactors = FALSE, header = TRUE)
head(dt)

Calculate the statistics

We calculate the per-cytosine statistics using two different test DSS and methylKit using the function calcLociStat. This function returns a MethCP object that is used for the future segmentation step. We allow parallelized computing when there are multiple chromosomes in the dataset.

# the sample names of the two groups to compare. They should be subsets of the 
# sample names provided when creating the `bsseq` objects.
group1 <- paste0("control", seq_len(3))
group2 <- paste0("treatment", seq_len(3))

# Below we calculate the per-cytosine statistics using two different 
# test `DSS` and `methylKit`. The users may pick one of the two for their
#  application.
# obj_DSS <- calcLociStat(bs_object, group1, group2, test = "DSS")
obj_methylKit <- calcLociStat(
    bs_object, group1, group2, test = "methylKit")
obj_methylKit

In cases the user wants to use their pre-calculated test statistics for experiments other than two-group comparison and time course data, we use the calculated statistics and create a MethCP object.

data <- data.frame(
    chr = rep("Chr01", 5),
    pos = c(2, 5, 9, 10, 18),
    effect.size = c(1,-1, NA, 9, Inf),
    pvals = c(0, 0.1, 0.9, NA, 0.02))
obj <- MethCPFromStat(
    data, test.name="myTest",
    pvals.field = "pvals",
    effect.size.field="effect.size",
    seqnames.field="chr",
    pos.field="pos"
)
obj

Segmentation

segmentMethCP performs segmentation on a MethCP object. We allow parallelized computing when there are multiple chromosomes in the dataset. Different from calcLociStat function in the previous section, we do not put any constraint on the number of cores used. Please see the documentation for adjusting the parameters used in the segmentation.

# obj_DSS <- segmentMethCP(
#     obj_DSS, bs_object, region.test = "weighted-coverage")

obj_methylKit <- segmentMethCP(
    obj_methylKit, bs_object, region.test = "fisher")
obj_methylKit

Significant regions

Use function getSigRegion on a MethCP object to get the list of DMRs.

# region_DSS <- getSigRegion(obj_DSS)
# head(region_DSS)
region_methylKit <- getSigRegion(obj_methylKit)
head(region_methylKit)

A time-course example

MethCP is flexible for a wide variety of experimental designs. We apply MethCP on an Arabidopsis thaliana seed germination dataset available from the GEO with accession number https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE94712 The data is generated from two replicates of dry seed and germinating seeds of wild-type plants (Col-0) and ros1 dml2 dml3 (rdd) triple demethylase mutant plants at 0-4 days after imbibition for 4 days (DAI). For cytosine-based statistics, we fit linear models on the methylation ratios and test the differences between the time coefficient of condition Col-0 and condition rdd.

Read the meta data.

meta_file <- system.file(
    "extdata", "meta_data.txt", package = "MethCP")
meta <- read.table(meta_file, sep = "\t", header = TRUE)

head(meta)

Read the counts data.

# Get the vector of file path and names 
raw_files <- system.file(
    "extdata", paste0(meta$SampleName, ".tsv"), package = "MethCP")

# read files
bs_object <- createBsseqObject(
    files = raw_files, sample_names = meta$SampleName, 
    chr_col = 1, pos_col = 2, m_col = 4, cov_col = 5, header = TRUE)

Apply coverage filter to make sure each loci has total coverage (summed across samples) more than 3 for each condition.

groups <- split(seq_len(nrow(meta)), meta$Condition)
coverages <- as.data.frame(getCoverage(bs_object, type = "Cov"))
filter <- rowSums(coverages[, meta$SampleName[groups[[1]]]] != 0) >= 3 &
    rowSums(coverages[, meta$SampleName[groups[[2]]]] != 0) >= 3
bs_object <- bs_object[filter, ]

Calculate the statistics. A dataframe of the meta data will be passed to function calcLociStatTimeCourse. Note that there must be columns named Condition, Time and SampleName in the dataframe.

obj <- calcLociStatTimeCourse(bs_object, meta)
obj

Segmentation.

obj <- segmentMethCP(obj, bs_object, region.test = "stouffer")

Get the DMRs.

regions <- getSigRegion(obj)
head(regions)

Session info

sessionInfo() 


Try the MethCP package in your browser

Any scripts or data that you put into this service are public.

MethCP documentation built on Nov. 8, 2020, 8:24 p.m.