epimutations: Epimutations analysis based on outlier detection methods

View source: R/epimutations.R

epimutationsR Documentation

Epimutations analysis based on outlier detection methods

Description

The function identifies differentially methylated regions in a case sample by comparing it against a control panel.

Usage

epimutations(
  case_samples,
  control_panel,
  method = "manova",
  chr = NULL,
  start = NULL,
  end = NULL,
  epi_params = epi_parameters(),
  maxGap = 1000,
  bump_cutoff = 0.1,
  min_cpg = 3,
  verbose = TRUE
)

Arguments

case_samples

a GenomicRatioSet object containing the case samples. See the constructor function GenomicRatioSet, makeGenomicRatioSetFromMatrix.

control_panel

a GenomicRatioSet object containing the control panel (control panel).

method

a character string naming the outlier detection method to be used. This can be set as: "manova", "mlm", "iForest", "mahdist", "quantile" and "beta". The default is "manova". For more information see Details.

chr

a character string containing the sequence names to be analysed. The default value is NULL.

start

an integer specifying the start position. The default value is NULL.

end

an integer specifying the end position. The default value is NULL.

epi_params

the parameters for each method. See the function epi_parameters.

maxGap

the maximum location gap used in bumphunter method.

bump_cutoff

a numeric value of the estimate of the genomic profile above the cutoff or below the negative of the cutoff will be used as candidate regions.

min_cpg

an integer specifying the minimum CpGs number in a DMR.

verbose

logical. If TRUE additional details about the procedure will provide to the user. The default is TRUE.

Details

The function compares a case sample against a control panel to identify epimutations in the given sample. First, the DMRs are identified using the bumphunter approach. After that, CpGs in those DMRs are tested in order to detect regions with CpGs being outliers. For that, different outlier detection methods can be selected:

  • Multivariate Analysis of Variance ("manova"). manova

  • Multivariate Linear Model ("mlm")

  • Isolation Forest ("iForest") isolation.forest

  • Robust Mahalanobis Distance ("mahdist") covMcd

  • Quantile distribution ("quantile")

  • Beta ("beta")

We defined candidate epimutation regions (found in candRegsGR) based on the 450K array design. As CpGs are not equally distributed along the genome, only CpGs closer to other CpGs can form an epimutation. More information can be found in candRegsGR documentation.

Value

The function returns an object of class tibble containing the outliers regions. The results are composed by the following columns:

  • epi_id: systematic name for each epimutation identified. It provides the name of the used anomaly detection method.

  • sample: the name of the sample containing the epimutation.

  • chromosome, start and end: indicate the location of the epimutation.

  • sz: the window's size of the event.

  • cpg_n: the number of CpGs in the epimutation.

  • cpg_ids: the names of CpGs in the epimutation.

  • outlier_score:

    • For method manova it provides the approximation to F-test and the Pillai score, separated by /.

    • For method mlm it provides the approximation to F-test and the R2 of the model, separated by /.

    • For method iForest it provides the magnitude of the outlier score.

    • For method beta it provides the mean outlier p-value.

    • For methods quantile and mahdist it is filled with NA.

  • outlier_direction: indicates the direction of the outlier with "hypomethylation" and "hypermethylation"

    • For manova, mlm, iForest, and mahdist it is computed from the values obtained from bumphunter.

    • For quantile it is computed from the location of the sample in the reference distribution (left vs. right outlier).

    • For method beta it return a NA.

  • pvalue:

    • For methods manova, mlm, and iForest it provides the p-value obtained from the model.

    • For method quantile, mahdist and beta is filled with NA.

  • adj_pvalue: for methods with p-value (manova and mlm adjusted p-value with Benjamini-Hochberg based on the total number of regions detected by Bumphunter.

  • epi_region_id: Name of the epimutation region as defined in candRegsGR.

  • CRE: cREs (cis-Regulatory Elements) as defined by ENCODE overlapping the epimutation region. Different cREs are separated by ;.

  • CRE_type: Type of cREs (cis-Regulatory Elements) as defined by ENCODE. Different type are separeted by, and different cREs are separated by ;.

Examples

data(GRset)

#Find epimutations in GSM2562701 sample of GRset dataset

case_samples <- GRset[,11]
control_panel <- GRset[,1:10]
epimutations(case_samples, control_panel, method = "manova")

isglobal-brge/EpiMutations documentation built on April 20, 2024, 9:05 a.m.