options(width=120)
knitr::opts_chunk$set(
  collapse = FALSE,
  eval = TRUE,
  comment = " ",
  tidy.opts =list(width.cutoff=80),
  tidy = TRUE,
  size="small"
)

\newpage

library(data.table)
library(jDMR)

\section{Input files}

For generation of region-level calls, jDMR requires the following inputs.

Methimpute files:

Base-level methylome outputs (generated using the R package "Methimpute")

A metadata file containing description about samples

For population data-sets without replicates, listfiles.fn should have the structure below.

\setlength{\leftskip}{1cm}

file: full PATH of file.

sample: a sample name

\setlength{\leftskip}{0pt}

samplefile1 <- system.file("extdata", "listFiles1.fn", package="jDMR")
fread(samplefile1, header = TRUE)

For pairwise control-treatment data-sets with replicates,additional columns "replicate" and "group" should be provided. See structure below.

\setlength{\leftskip}{1cm}

file: full PATH of file

sample: a sample name

replicate: label for replicates

group: label for control and treatment groups

\setlength{\leftskip}{0pt}

samplefile2 <- system.file("extdata", "listFiles2.fn", package="jDMR")
fread(samplefile2, header = TRUE)
#out.dir <- paste0("/myfolder/results/", Sys.getenv("LOGNAME"),"/")
out.dir <- "/Users/rashmi/basedir/jDMR-test/"

\section{Generate cytosine region calls from genome}

jDMR detects DMRs using two approaches a) finding cytosine clusters in the genome (section 2.1) b) using a binning approach (section 2.2). You can use either of the methods to obtain the region calls. The remaining steps, makeDMRmatrix, filterDMRmatrix, annotateDMRs are the same for both methods.

Run jDMR on cytosine clusters extracted from genome

This function starts by identifying cytosine clusters in the genome and uses them in each sample to identify DMRs.

\setlength{\leftskip}{1cm}

fasta.file: PATH to FASTA file

samplefiles: PATH to file containing description about the samples

genome: a string containing name of the genome

out.dir: PATH to output directory

contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".

\setlength{\leftskip}{0pt}

library(jDMR)
out.dir <- "/myfolder/DMR-results"
myfasta <- system.file("extdata","TAIR10_chr_all.fa", package="jDMR")
samplefile <- system.file("extdata","listFiles2.fn", package="jDMR")

runjDMRregions(fasta.file=myfasta,
               samplefiles=samplefile,
               genome="Arabidopsis",
               out.dir=out.dir)

Output files of jDMR Regions approach

Rdata files containing coordinates of Cytosine clusters will be generated for each chromosome and cytosine context.

Output file "Arabidopsis_regions_chr1_CG.Rdata" contains coordinates of cytosine clusters

regionfile <- dget(system.file("extdata","min.C_5/fp0.01/Arabidopsis_regions_chr1_CG.Rdata", package="jDMR"))
head(regionfile$reg.obs)

Region files containing state calls and methylation levels will be generated for each sample and each cytosine context.

jDMR.out <- fread("/Users/rashmi/basedir/jDMR-test/regions/A_CG.txt", header = TRUE) 
head(jDMR.out)

\setlength{\leftskip}{1cm}

seqnames, start and end: Chromosome coordinates

context: Sequence context of cytosine i.e CG,CHG,CHH

posteriorMax: Posterior value of the methylation state call

status : Methylation status

rc.meth.lvl: Recalibrated methylation level calculated from the posteriors and fitted parameters

\setlength{\leftskip}{0pt}

Run jDMR on a binned genome

This function uses a grid approach to bin the genome into equal sized bins ranging from 100 to 1000 bps. The optimum bin and step size will be determined automatically for each cytosine context using the min.C parameter.

\setlength{\leftskip}{1cm} fasta.file: PATH to FASTA file

samplefiles: PATH to file containing description about the samples

genome: a string containing name of the genome

out.dir: PATH to output directory

contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".

min.C: minimum number of cytosines in a bin. bins lower than specified theshold will be dropped.

\setlength{\leftskip}{0pt}

library(jDMR)

out.dir <- "/myfolder/DMR-results"
myfasta <- system.file("extdata","TAIR10_chr_all.fa", package="jDMR")
samplefile <- system.file("extdata","listFiles2.fn", package="jDMR")

runjDMRgrid(out.dir=out.dir,
            fasta.file=myfasta,
            samplefiles=samplefile,
            min.C=10,
            genome="Arabidopsis")

Output files of jDMR Grid approach

Region files containing state calls and methylation levels will be generated for each sample and for each context.

jDMR.out <- fread("/Users/rashmi/basedir/jDMR-test/grid/A_CG.txt", header = TRUE) 
head(jDMR.out)

\setlength{\leftskip}{1cm}

seqnames, start and end: Chromosome coordinates

context: Sequence context of cytosine i.e CG,CHG,CHH

posteriorMax: Posterior value of the methylation state call

status : Methylation status

rc.meth.lvl: Recalibrated methylation level calculated from the posteriors and fitted parameters

\setlength{\leftskip}{0pt}

\section{Generate DMR matrix}

Run "makeDMRmatrix"

This function generates a DMR matrix of state calls, rc.meth.lvls and posterior probabilities for all samples in one dataframe.

\setlength{\leftskip}{1cm} samplefiles: PATH to file containing description about the samples

input.dir: PATH to directory containing region files.

out.dir: PATH to output directory.

contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".

postMax.out: By default this option is set as FALSE. You can set it to TRUE if you want to output the DMR matrix containing posterior probabilities for the status call of each region.

\setlength{\leftskip}{0pt}

input.dir <- "/myfolder/DMR-results"
out.dir <- "/myfolder/DMRmatrix-results"
samplefile <- system.file("extdata","listFiles2.fn", package="jDMR")

makeDMRmatrix(samplefiles=samplefile,
              input.dir=myinput,
              out.dir=out.dir)

Output files of DMRmatrix function

"CG_StateCalls.txt" has the following structure. "0" in the output matrix denotes "Unmethylated" and "1" stands for "Methylated".

out.dir <- "/Users/rashmi/basedir/jDMR-test/grid/"
statecalls <- fread(paste0(out.dir, "CG_StateCalls.txt" , sep=""), header=TRUE)
head(statecalls)

"CG_rcMethlvl.txt" has the following structure. The output matrix contains recalibrated methylation levels for each sample and for the specific region.

rcmethlvls <- fread(paste0(out.dir, "CG_rcMethlvl.txt" , sep=""), header=TRUE)
head(rcmethlvls)

"CG_postMax.txt" has the following structure. The output matrix contains posterior probabilities for each sample and for the specific region.

postMax <- fread(paste0(out.dir, "CG_postMax.txt" , sep=""), header=TRUE)
head(postMax)

split DMR matrix into pairwise groups (only applicable for datasets with control- treatments)

Ignore this step if you are running jDMR on population data without replicates

\setlength{\leftskip}{1cm} samplefiles: PATH to file containing description about the samples

input.dir: PATH to directory containing region files.

out.dir: PATH to output directory.

contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".

postMax.out: by default this option is set to FALSE. If you want to output the matrix containing posterior probabilities set it to TRUE.

\setlength{\leftskip}{0pt}

samplefile <- system.file("extdata","listFiles2.fn", package="jDMR")

split.groups(samplefiles=samplefile,
             input.dir="/myfolder/DMRmatrix-results",
             out.dir="/myfolder/DMRmatrix-results/split_gps")

\section{Filter DMR matrix}

Filter the DMR matrix

This function filters the DMR matrix for non-polymorphic patterns.

\setlength{\leftskip}{1cm}

gridDMR: set this option to TRUE if Grid approach was used otherwise set to FALSE. The output will contain merged regions.

data.dir: PATH to folder containing DMR matrix

epiMAF.cutoff: Applicable for calling calling population DMRs. This option can be used to filter for Minor Epi-Allele frequency as specified by user. By default, this option is set to NULL.

replicate.consensus : Applicable for control-treatment data-sets with replicates. Users can specify the percentage of concordance in methylation states in samples with multiple replicates. For datasets with just 2 replicates, \textit{replicate.consensus} should be set as 1 (means 100% concordance). By default, this option is set to NULL.

rc.methlvl.out: Output filtered matrix containing recalibrated methylation levels. By default, this option is set to FALSE.

\setlength{\leftskip}{0pt}

out.dir <- "/myfolder/DMRmatrix-results/split_gps"
filterDMRmatrix(gridDMR=TRUE, # setting to TRUE because we are using the outputs of grid approach
                data.dir=out.dir,
                replicate.consensus=1) #since we have 2 replicates for each sample

Filtered Output

"CG_StateCalls-filtered.txt" has the following structure.

my.dir <- "/Users/rashmi/basedir/jDMR-test/grid/split_files/"
statecallsFiltered <- fread(paste0(my.dir, "CG_WT_mutant1_StateCalls-filtered.txt" , sep=""), header=TRUE)
head(statecallsFiltered)

If "rc.methlvl.out" option is set to TRUE a filtered matrix with averaged methylation levels in generated.

rcmethlvlFiltered <- fread(paste0(my.dir, "CG_WT_mutant1_rcmethlvl-filtered.txt" , sep=""), header=TRUE)
head(rcmethlvlFiltered)

Output context specific DMRs

Output DMRs specific for contexts i.e CG-only, CHG-only, CHH-only, non-CG and multi-context DMRs using the *StateCalls-filtered.txt files.

samplefile <- system.file("extdata","listFiles2.fn", package="jDMR")
out.dir <- "/myfolder/DMRmatrix-results/split_gps"

context.specific.DMRs(samplefiles=samplefile,
                      data.dir=out.dir)

\section{Annotate DMRs}

This function annotates the lists of DMRs. Any file(.txt) containing 3 columns (chr, start, stop) can be annotated using the annotateDMRs function. Please move all files to be annotated to a separate folder and set the full PATH to the "input.dir" option.

\setlength{\leftskip}{1cm}

gff.files: Multiple gff3 annotation files can be supplied as a vector

annotation: specify annotation categories

input.dir: path to folder containing only files to be annotated. Any file containing 3 columns (chr, start, stop) can be annotated using the annotateDMRs function.

gff3.out: whether to output annotated files in gff3 format

out.dir: path to output folder

\setlength{\leftskip}{0pt} In the following example, I will annotate the files generated in section 4.3

# annotation files
gff.AT <- "/Annotations/Arabidopsis_thaliana.TAIR10.47.gff3"
gff.TE <- "/Annotations/TAIR10_TE.gff3"
gff.pr <- "/Annotations/TAIR10_promoters.gff3"

mydir <- "/myfolder/annotate_DMRs/" 

annotateDMRs(gff.files=c(gff.AT, gff.TE, gff.pr),
             annotation=c("gene","promoters","TE"),
             input.dir=mydir,
             gff3.out=TRUE,
             out.dir=mydir)

Output files after annotation

Mapped files are output in .txt and/or .gff3 format. Addiitonally, a DMR count table is generated.

out.dir <- "/Users/rashmi/basedir/jDMR-test/regions/"
DMRcounts <- fread(paste0(out.dir, "annotate_DMRs/DMR-counts.txt" , sep=""), header=TRUE)
DMRcounts

\section{R session info }

sessionInfo()


jlab-code/jDMR documentation built on April 23, 2022, 11:02 a.m.