dmrcate: DMR identification
In rcavalcante/DMRcate: Methylation array and sequencing spatial analysis methods

Description Usage Arguments Details Value Author(s) References Examples

The main function of this package. Computes a kernel estimate against a null comparison to identify significantly differentially (or variable) methylated regions.

dmrcate(object, 
           lambda = 1000,
           C=NULL,
           p.adjust.method = "BH", 
           pcutoff = "fdr", 
           consec = FALSE, 
           conseclambda = 10, 
           betacutoff = NULL,
           min.cpgs = 2,
           mc.cores = 1
           )

`object`	A class of type "annot", created from `cpg.annotate`.
`lambda`	Gaussian kernel bandwidth for smoothed-function estimation. Also informs DMR bookend definition; gaps >= `lambda` between significant CpG sites will be in separate DMRs. Support is truncated at 5*`lambda`. Default is 1000 nucleotides. See details for further info.
`C`	Scaling factor for bandwidth. Gaussian kernel is calculated where `lambda`/`C` = sigma. Empirical testing shows that, for 450k data when `lambda=1000`, near-optimal prediction of sequencing-derived DMRs is obtained when `C` is approximately 2, i.e. 1 standard deviation of Gaussian kernel = 500 base pairs. Should be a lot larger for sequencing data - suggest C=50. Cannot be < 0.2.
`p.adjust.method`	Method for p-value adjustment from the significance test. Default is `"BH"` (Benjamini-Hochberg).
`pcutoff`	p-value cutoff to determine DMRs. Default is automatically determined by the number of significant CpGs returned by either `limma` or `DSS` for that contrast, but can be set manually with a numeric value. Default is highly recommended, and thresholding can be adjusted using the `fdr` argument in `cpg.annotate()`.
`consec`	Use `DMRcate` in consecutive mode. Treats CpG sites as equally spaced.
`conseclambda`	Bandwidth in CpGs (rather than nucleotides) to use when `consec=TRUE`. When specified the variable `lambda` simply becomes the minumum distance separating DMRs.
`betacutoff`	Optional filter; removes any region from the results where the absolute mean beta shift is less than the given value.
`min.cpgs`	Minimum number of consecutive CpGs constituting a DMR.
`mc.cores`	When > 1, the processor will attempt to run the kernel smoothing in parallel, 1 chromosome per core. Use with discretion. Default recommended for laptop use. Please use `detectCores()` and htop in your terminal to check your resource ceiling before increasing the default.

The values of lambda and C should be chosen with care. For array data, we currently recommend that half a kilobase represent 1 standard deviation of support (lambda=1000 and C=2), and 20bp (C=50) for WGBS data. If lambda is too small or C too large then the kernel estimator will not have enough support to significantly differentiate the weighted estimate from the null distribution. If lambda is too large then dmrcate will report very long DMRs spanning multiple gene loci, and the large amount of support will likely give Type I errors. If you are concerned about Type I errors we recommend using the default value of pcutoff, although this will return no DMRs if no DM CpGs are returned by limma/DSS either.

A list containing 2 data frames (input and results) and a numeric value (cutoff). input contains the contents of the annot object, plus calculated p-values:

ID: As per annotation object input
stat: As per annotation object input
CHR: As per annotation object input
pos: As per annotation object input
betafc: As per annotation object input
raw: Raw p-values from the significance test
fdr: Adjusted p-values from the significance test
step.dmr: Vector denoting the start of a new DMR (TRUE), constitutive of a DMR, but not the start (FALSE), or non-DMR (NA).

results contains an annotated data.frame of significant regions, ranked by Stouffer:

coord: Coordinates of the significant region in hg19. IGV- and UCSC-friendly.
no.cpgs: Number of CpG sites constituting the significant region. Tie-breaker when sorting by Stouffer.
minfdr: Minimum adjusted p-value from the CpGs constituting the significant region.
Stouffer: Stouffer transformation of the group of limma- or DSS-derived fdrs for individual CpG sites as DMR constituents.
maxbetafc: Maximum absolute beta fold change within the region
meanbetafc: Mean beta fold change within the region.

cutoff is the signficance p-value cutoff provided in the call to dmrcate.

Tim J. Peters <t.peters@garvan.org.au>, Mike J. Buckley <Mike.Buckley@csiro.au>, Tim Triche Jr. <tim.triche@usc.edu>

Peters T.J., Buckley M.J., Statham, A., Pidsley R., Samaras K., Lord R.V., Clark S.J. and Molloy P.L. De novo identification of differentially methylated regions in the human genome. Epigenetics & Chromatin 2015, 8:6, doi:10.1186/1756-8935-8-6

Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall.

Duong T. (2013) Local significant differences from nonparametric two-sample tests. Journal of Nonparametric Statistics. 2013 25(3), 635-645.

## Not run: 
data(dmrcatedata)
myMs <- logit2(myBetas)
myMs.noSNPs <- rmSNPandCH(myMs, dist=2, mafcut=0.05)
patient <- factor(sub("-.*", "", colnames(myMs)))
type <- factor(sub(".*-", "", colnames(myMs)))
design <- model.matrix(~patient + type) 
myannotation <- cpg.annotate("array", myMs.noSNPs, what="M", arraytype = "450K",
                             analysis.type="differential", design=design, coef=39)
dmrcoutput <- dmrcate(myannotation, lambda=1000)

## End(Not run)