options(width=120) knitr::opts_chunk$set( collapse = FALSE, eval = TRUE, comment = " ", tidy.opts =list(width.cutoff=80), tidy = TRUE, size="small" )
\newpage
library(data.table) library(jDMR)
\section{Input files}
For generation of region-level calls, jDMR requires the following inputs.
Base-level methylome outputs (generated using the R package "Methimpute")
For population data-sets without replicates, listfiles.fn should have the structure below.
\setlength{\leftskip}{1cm}
file: full PATH of file.
sample: a sample name
\setlength{\leftskip}{0pt}
samplefile1 <- system.file("extdata", "listFiles1.fn", package="jDMR") fread(samplefile1, header = TRUE)
For pairwise control-treatment data-sets with replicates,additional columns "replicate" and "group" should be provided. See structure below.
\setlength{\leftskip}{1cm}
file: full PATH of file
sample: a sample name
replicate: label for replicates
group: label for control and treatment groups
\setlength{\leftskip}{0pt}
samplefile2 <- system.file("extdata", "listFiles2.fn", package="jDMR") fread(samplefile2, header = TRUE)
#out.dir <- paste0("/myfolder/results/", Sys.getenv("LOGNAME"),"/") out.dir <- "/Users/rashmi/basedir/jDMR-test/"
\section{Generate cytosine region calls from genome}
jDMR detects DMRs using two approaches a) finding cytosine clusters in the genome (section 2.1) b) using a binning approach (section 2.2). You can use either of the methods to obtain the region calls. The remaining steps, makeDMRmatrix, filterDMRmatrix, annotateDMRs are the same for both methods.
This function starts by identifying cytosine clusters in the genome and uses them in each sample to identify DMRs.
\setlength{\leftskip}{1cm}
fasta.file: PATH to FASTA file
samplefiles: PATH to file containing description about the samples
genome: a string containing name of the genome
out.dir: PATH to output directory
contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".
\setlength{\leftskip}{0pt}
library(jDMR)
out.dir <- "/myfolder/DMR-results" myfasta <- system.file("extdata","TAIR10_chr_all.fa", package="jDMR") samplefile <- system.file("extdata","listFiles2.fn", package="jDMR") runjDMRregions(fasta.file=myfasta, samplefiles=samplefile, genome="Arabidopsis", out.dir=out.dir)
Rdata files containing coordinates of Cytosine clusters will be generated for each chromosome and cytosine context.
Output file "Arabidopsis_regions_chr1_CG.Rdata" contains coordinates of cytosine clusters
regionfile <- dget(system.file("extdata","min.C_5/fp0.01/Arabidopsis_regions_chr1_CG.Rdata", package="jDMR"))
head(regionfile$reg.obs)
Region files containing state calls and methylation levels will be generated for each sample and each cytosine context.
jDMR.out <- fread("/Users/rashmi/basedir/jDMR-test/regions/A_CG.txt", header = TRUE)
head(jDMR.out)
\setlength{\leftskip}{1cm}
seqnames, start and end: Chromosome coordinates
context: Sequence context of cytosine i.e CG,CHG,CHH
posteriorMax: Posterior value of the methylation state call
status : Methylation status
rc.meth.lvl: Recalibrated methylation level calculated from the posteriors and fitted parameters
\setlength{\leftskip}{0pt}
This function uses a grid approach to bin the genome into equal sized bins ranging from 100 to 1000 bps. The optimum bin and step size will be determined automatically for each cytosine context using the min.C parameter.
\setlength{\leftskip}{1cm} fasta.file: PATH to FASTA file
samplefiles: PATH to file containing description about the samples
genome: a string containing name of the genome
out.dir: PATH to output directory
contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".
min.C: minimum number of cytosines in a bin. bins lower than specified theshold will be dropped.
\setlength{\leftskip}{0pt}
library(jDMR) out.dir <- "/myfolder/DMR-results" myfasta <- system.file("extdata","TAIR10_chr_all.fa", package="jDMR") samplefile <- system.file("extdata","listFiles2.fn", package="jDMR") runjDMRgrid(out.dir=out.dir, fasta.file=myfasta, samplefiles=samplefile, min.C=10, genome="Arabidopsis")
Region files containing state calls and methylation levels will be generated for each sample and for each context.
jDMR.out <- fread("/Users/rashmi/basedir/jDMR-test/grid/A_CG.txt", header = TRUE)
head(jDMR.out)
\setlength{\leftskip}{1cm}
seqnames, start and end: Chromosome coordinates
context: Sequence context of cytosine i.e CG,CHG,CHH
posteriorMax: Posterior value of the methylation state call
status : Methylation status
rc.meth.lvl: Recalibrated methylation level calculated from the posteriors and fitted parameters
\setlength{\leftskip}{0pt}
\section{Generate DMR matrix}
This function generates a DMR matrix of state calls, rc.meth.lvls and posterior probabilities for all samples in one dataframe.
\setlength{\leftskip}{1cm} samplefiles: PATH to file containing description about the samples
input.dir: PATH to directory containing region files.
out.dir: PATH to output directory.
contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".
postMax.out: By default this option is set as FALSE. You can set it to TRUE if you want to output the DMR matrix containing posterior probabilities for the status call of each region.
\setlength{\leftskip}{0pt}
input.dir <- "/myfolder/DMR-results" out.dir <- "/myfolder/DMRmatrix-results" samplefile <- system.file("extdata","listFiles2.fn", package="jDMR") makeDMRmatrix(samplefiles=samplefile, input.dir=myinput, out.dir=out.dir)
"CG_StateCalls.txt" has the following structure. "0" in the output matrix denotes "Unmethylated" and "1" stands for "Methylated".
out.dir <- "/Users/rashmi/basedir/jDMR-test/grid/"
statecalls <- fread(paste0(out.dir, "CG_StateCalls.txt" , sep=""), header=TRUE) head(statecalls)
"CG_rcMethlvl.txt" has the following structure. The output matrix contains recalibrated methylation levels for each sample and for the specific region.
rcmethlvls <- fread(paste0(out.dir, "CG_rcMethlvl.txt" , sep=""), header=TRUE) head(rcmethlvls)
"CG_postMax.txt" has the following structure. The output matrix contains posterior probabilities for each sample and for the specific region.
postMax <- fread(paste0(out.dir, "CG_postMax.txt" , sep=""), header=TRUE) head(postMax)
Ignore this step if you are running jDMR on population data without replicates
\setlength{\leftskip}{1cm} samplefiles: PATH to file containing description about the samples
input.dir: PATH to directory containing region files.
out.dir: PATH to output directory.
contexts: sequence contexts of the cytosine. By default this option is set to c("CG", "CHG", "CHH"). If you want to run for a single context such as CG, set it as "CG".
postMax.out: by default this option is set to FALSE. If you want to output the matrix containing posterior probabilities set it to TRUE.
\setlength{\leftskip}{0pt}
samplefile <- system.file("extdata","listFiles2.fn", package="jDMR") split.groups(samplefiles=samplefile, input.dir="/myfolder/DMRmatrix-results", out.dir="/myfolder/DMRmatrix-results/split_gps")
\section{Filter DMR matrix}
This function filters the DMR matrix for non-polymorphic patterns.
\setlength{\leftskip}{1cm}
gridDMR: set this option to TRUE if Grid approach was used otherwise set to FALSE. The output will contain merged regions.
data.dir: PATH to folder containing DMR matrix
epiMAF.cutoff: Applicable for calling calling population DMRs. This option can be used to filter for Minor Epi-Allele frequency as specified by user. By default, this option is set to NULL.
replicate.consensus : Applicable for control-treatment data-sets with replicates. Users can specify the percentage of concordance in methylation states in samples with multiple replicates. For datasets with just 2 replicates, \textit{replicate.consensus} should be set as 1 (means 100% concordance). By default, this option is set to NULL.
rc.methlvl.out: Output filtered matrix containing recalibrated methylation levels. By default, this option is set to FALSE.
\setlength{\leftskip}{0pt}
out.dir <- "/myfolder/DMRmatrix-results/split_gps" filterDMRmatrix(gridDMR=TRUE, # setting to TRUE because we are using the outputs of grid approach data.dir=out.dir, replicate.consensus=1) #since we have 2 replicates for each sample
"CG_StateCalls-filtered.txt" has the following structure.
my.dir <- "/Users/rashmi/basedir/jDMR-test/grid/split_files/"
statecallsFiltered <- fread(paste0(my.dir, "CG_WT_mutant1_StateCalls-filtered.txt" , sep=""), header=TRUE) head(statecallsFiltered)
If "rc.methlvl.out" option is set to TRUE a filtered matrix with averaged methylation levels in generated.
rcmethlvlFiltered <- fread(paste0(my.dir, "CG_WT_mutant1_rcmethlvl-filtered.txt" , sep=""), header=TRUE) head(rcmethlvlFiltered)
Output DMRs specific for contexts i.e CG-only, CHG-only, CHH-only, non-CG and multi-context DMRs using the *StateCalls-filtered.txt files.
samplefile <- system.file("extdata","listFiles2.fn", package="jDMR") out.dir <- "/myfolder/DMRmatrix-results/split_gps" context.specific.DMRs(samplefiles=samplefile, data.dir=out.dir)
\section{Annotate DMRs}
This function annotates the lists of DMRs. Any file(.txt) containing 3 columns (chr, start, stop) can be annotated using the annotateDMRs function. Please move all files to be annotated to a separate folder and set the full PATH to the "input.dir" option.
\setlength{\leftskip}{1cm}
gff.files: Multiple gff3 annotation files can be supplied as a vector
annotation: specify annotation categories
input.dir: path to folder containing only files to be annotated. Any file containing 3 columns (chr, start, stop) can be annotated using the annotateDMRs function.
gff3.out: whether to output annotated files in gff3 format
out.dir: path to output folder
\setlength{\leftskip}{0pt} In the following example, I will annotate the files generated in section 4.3
# annotation files gff.AT <- "/Annotations/Arabidopsis_thaliana.TAIR10.47.gff3" gff.TE <- "/Annotations/TAIR10_TE.gff3" gff.pr <- "/Annotations/TAIR10_promoters.gff3" mydir <- "/myfolder/annotate_DMRs/" annotateDMRs(gff.files=c(gff.AT, gff.TE, gff.pr), annotation=c("gene","promoters","TE"), input.dir=mydir, gff3.out=TRUE, out.dir=mydir)
Mapped files are output in .txt and/or .gff3 format. Addiitonally, a DMR count table is generated.
out.dir <- "/Users/rashmi/basedir/jDMR-test/regions/"
DMRcounts <- fread(paste0(out.dir, "annotate_DMRs/DMR-counts.txt" , sep=""), header=TRUE) DMRcounts
\section{R session info }
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.