knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(MethSig)

Setup

setRepositories(ind = 1:2)
install.packages("devtools")
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("GenomeInfoDbData")
devtools::install_github("HengPan2007/MethSig", build_vignettes = T)
library(MethSig)

The latest R version (R-4.0.2) is recommended. If an old version is used, users may need to install dependencies manually. For example, if R-3.6.3 is used, mnormt need to be installed first.

install.packages("https://cran.r-project.org/src/contrib/Archive/mnormt/mnormt_1.5-7.tar.gz", repos=NULL)

Introduction

MethSig is used to infer DNA methylation (DNAme) drivers from bisulfite converted data of cancer cohorts.

Prerequisite

1. Data alignment and processing

Bisulfite converted data is aligned and processed as described in the published book chapter (Pan et al., Cancer Systems Biology, 2018).

2. Covariate matrix preparation

A matrix of covariates used in the beta regression model is needed. The matrix needs to include a column of hugo gene symbol and at least one column of a covariate. An example can be loaded by invisible(matCV). This example matrix includes Hugo symbol (Hugo), average promoter DHcR level in control samples (DHcR_Normal), average promoter PDR level in control samples (PDR_Normal), average gene expression level in control samples (GEXP_Normal) and DNA replication time (Reptime). Users can define their own covariates.

head(matCV)

3. Files containing information of hypermethylated cytosines (HCs)

A tab-separated values input file (without a header line) contains details of differentially methylated cytosines (DMCs) with following columns (V1 to V11): chr, pos, numC in control, numC + numT in control, numC in tumor, numC + numT in tumor, CpG methylation ratio (tumor methylation / control methylation), chi-squared test p-value, adjusted p-value, significance, hyper or hypo in tumor. Details of generating this type of files were described in Pan et al., Cancer Systems Biology, 2018. An example file can be found in extdata/DMC.SRR2069925.txt. Notably, files need to be named as DMC.sample_name.txt.

tmp <- read.table(system.file("extdata", "DMC.SRR2069925.txt", package = "MethSig"), sep="\t")
head(tmp)

4. Files containing information of proportion of discordant reads (PDR) at CpGs

A tab-separated values input file contains details of single CpG PDR with following columns: chr, start, strand, ConMethReadCount, ConUMethReadCount, DisReadCount, NAReadCount. Details of PDR were described in Landau et al., Cancer Cell, 2014. extdata/pdrCall_from_Bismark.py can be used to call PDR of an individual CpG from Bismark (Krueger et al., Bioinformatics, 2011) output files starting with CpG_OB or CpG_OT. An example file can be found in extdata/PDR.SRR2069925.txt. Notably, files need to be named as PDR.sample_name.txt.

tmp <- read.table(system.file("extdata", "PDR.SRR2069925.txt", package = "MethSig"), header=T, sep="\t")
head(tmp)

Input matrix generation

1. Promoter DHcR calculation

Promoter (defined as ± 2kb window centered on RefSeq transcription start site) hypermethylation is measured using differentially hypermethylated cytosine ratio (DHcR), defined as the ratio of hypermethylated cytosines (HCs) to the total number of promoter CpGs profiled. HCs of each sample are defined as CpGs at which DNAme is statistically higher than the average DNAme of control samples (false discovery rate = 20%, Chi-squared test). Only CpGs with read depth greater than 10 are included in the analysis. RRBS data of normal samples is used as control. An implemented function makeHG19Promoters can be used to provide promoter annotation of hg19 RefSeq genes. Users can define their own annotation.

dhcr <- promoterDHcR(file_name = system.file("extdata", "DMC.SRR2069925.txt", package = "MethSig"),
                     pro = makeHG19Promoters())
head(dhcr)

2. Promoter PDR calculation

If all the CpGs on a specific read are methylated, or all the CpGs on a read are unmethylated, the read is classified as concordant; otherwise it is classified as discordant. At each CpG, the PDR is equal to the number of discordant reads that cover that location divided by the total number of reads that cover that location. The promoter PDR is given by averaging the values of individual CpGs, as calculated for all CpGs within the promoter of interest with equal or greater than 10 reads covering at least 4 CpGs.

pdr <- promoterPDR(file_name = system.file("extdata", "PDR.SRR2069925.txt", package = "MethSig"),
                   pro = makeHG19Promoters())
head(pdr)

3. Input matrix generation

As mentioned in the Prerequisite section, users need to put DMC.sample_name.txt and PDR.sample_name.txt files in the input_dir folder. Also, a user defined covariate matrix is needed.

ds <- makeInputMatrix(names_list = as.list("SRR2069925"),
                matCV = invisible(matCV),
                pro = makeHG19Promoters(),
                input_dir = system.file("extdata", "", package = "MethSig"))
head(ds)
head(invisible(inputMatExample))

Sample-specific hypermethylation inference

Expected promoter DHcR of tumor samples is estimated by beta regression model and expected DHcR is tested against observed DHcR to infer hypermethylation status.

pval <- pvalueBetaReg(formula = as.formula("DHcR_Tumor_Beta~DHcR_Normal+PDR_Normal+GEXP_Normal+Reptime+PDR_Tumor+Depth_Tumor+CpGs_Tumor"),
                      data = invisible(inputMat))
head(pval)
head(invisible(pvalByGenePt))

Cohort-prevalent hypermethylation inference

Wilkinson p-value combination method is used to determine if promoter hypermethylation is over-represented in the cohort. To eliminate the effect of cohort size on p-value combination results, MethSig randomly samples equal number of patients iteratively and uses lower quartile of combined p-values to infer hypermethylation.

pval <- pvalueCombine(data = invisible(pvalByGenePt))
head(pval)
head(invisible(pvalueC))


HengPan2007/MethSig documentation built on Aug. 1, 2020, 4:52 a.m.