EpiCompare: Compare epigenomic datasets

View source: R/EpiCompare.R

EpiCompareR Documentation

Compare epigenomic datasets

Description

This function compares and analyses multiple epigenomic datasets and outputs an HTML report containing all results of the analysis. The report is mainly divided into three sections: (1) General Metrics on Peakfiles, (2) Peak Overlaps and (3) Functional Annotation of Peaks.

Usage

EpiCompare(
  peakfiles,
  genome_build,
  genome_build_output = "hg19",
  blacklist = NULL,
  picard_files = NULL,
  reference = NULL,
  upset_plot = FALSE,
  stat_plot = FALSE,
  chromHMM_plot = FALSE,
  chromHMM_annotation = "K562",
  chipseeker_plot = FALSE,
  enrichment_plot = FALSE,
  tss_plot = FALSE,
  tss_distance = c(-3000, 3000),
  precision_recall_plot = FALSE,
  n_threshold = 20,
  corr_plot = FALSE,
  bin_size = 5000,
  interact = TRUE,
  add_download_button = FALSE,
  save_output = FALSE,
  output_filename = "EpiCompare",
  output_timestamp = FALSE,
  output_dir,
  display = NULL,
  run_all = FALSE,
  workers = 1,
  quiet = FALSE,
  error = FALSE,
  debug = FALSE
)

Arguments

peakfiles

A list of peak files as GRanges object and/or as paths to BED files. If paths are provided, EpiCompare imports the file as GRanges object. EpiCompare also accepts a list containing a mix of GRanges objects and paths.Files must be listed and named using list(). E.g. list("name1"=file1, "name2"=file2). If no names are specified, default file names will be assigned.

genome_build

A named list indicating the genome build used to generate each of the following inputs:

  • "peakfiles" : Genome build for the peakfiles input. Assumes genome build is the same for each element in the peakfiles list.

  • "reference" : Genome build for the reference input.

  • "blacklist" : Genome build for the blacklist input.

Example input list:
genome_build = list(peakfiles="hg38", reference="hg19", blacklist="hg19")

Alternatively, you can supply a single character string instead of a list. This should only be done in situations where all three inputs (peakfiles, reference, blacklist) are of the same genome build. For example:
genome_build = "hg19"

Supported genome builds are: "hg19", "hg38", "mm9" and "mm10".

genome_build_output

Genome build to standardise all inputs to. Liftovers will be performed automatically as needed. Default: "hg19".

Note: Cross-species liftovers are supported.

blacklist

A GRanges object containing blacklisted genomic regions. Blacklists included in EpiCompare are:

  • NULL (default): Automatically selects the appropriate blacklist based on the genome_build_output argument.

  • "hg19_blacklist": Regions of hg19 genome that have anomalous and/or unstructured signals. hg19_blacklist

  • "hg38_blacklist": Regions of hg38 genome that have anomalous and/or unstructured signals. hg38_blacklist

  • "mm10_blacklist": Regions of mm10 genome that have anomalous and/or unstructured signals. mm10_blacklist

  • "mm9_blacklist": Blacklisted regions of mm10 genome that have been lifted over from mm10_blacklist. mm9_blacklist

  • <user_input>: A custom user-provided blacklist in GRanges format.

picard_files

A list of summary metrics output from Picard. Files must be in data.frame format and listed using list() and named using names(). To import Picard duplication metrics (.txt file) into R as data frame, use:
picard <- read.table("/path/to/picard/output", header = TRUE, fill = TRUE).

reference

A named list containing reference peak file(s) as GRanges object. Please ensure that the reference file is listed and named i.e. list("reference_name" = reference_peak). If more than one reference is specified, individual reports for each reference will be generated. However, please note that specifying more than one reference can take awhile. If a reference is specified, it enables two analyses: (1) plot showing statistical significance of overlapping/non-overlapping peaks; and (2) ChromHMM of overlapping/non-overlapping peaks.

upset_plot

Default FALSE. If TRUE, the report includes upset plot of overlapping peaks.

stat_plot

Default FALSE. If TRUE, the function creates a plot showing the statistical significance of overlapping/non-overlapping peaks. Reference peak file must be provided.

chromHMM_plot

Default FALSE. If TRUE, the function outputs ChromHMM heatmap of individual peak files. If a reference peak file is provided, ChromHMM annotation of overlapping and non-overlapping peaks is also provided.

chromHMM_annotation

ChromHMM annotation for ChromHMM plots. Default K562 cell-line. Cell-line options are:

  • "K562" = K-562 cells

  • "Gm12878" = Cellosaurus cell-line GM12878

  • "H1hesc" = H1 Human Embryonic Stem Cell

  • "Hepg2" = Hep G2 cell

  • "Hmec" = Human Mammary Epithelial Cell

  • "Hsmm" = Human Skeletal Muscle Myoblasts

  • "Huvec" = Human Umbilical Vein Endothelial Cells

  • "Nhek" = Normal Human Epidermal Keratinocytes

  • "Nhlf" = Normal Human Lung Fibroblasts

chipseeker_plot

Default FALSE. If TRUE, the report includes a barplot of ChIPseeker annotation of peak files.

enrichment_plot

Default FALSE. If TRUE, the report includes dotplots of KEGG and GO enrichment analysis of peak files.

tss_plot

Default FALSE. If TRUE, the report includes peak count frequency around transcriptional start site. Note that this can take awhile.

tss_distance

A vector specifying the distance upstream and downstream around transcription start sites (TSS). The default value is c(-3000,3000); meaning peak frequency 3000bp upstream and downstream of TSS will be displayed.

precision_recall_plot

Default is FALSE. If TRUE, creates a precision-recall curve plot and an F1 plot using plot_precision_recall.

n_threshold

Number of thresholds to test.

corr_plot

Default is FALSE. If TRUE, creates a correlation plot across all peak files using plot_corr.

bin_size

Default of 100. Base-pair size of the bins created to measure correlation. Use smaller value for higher resolution but longer run time and larger memory usage.

interact

Default TRUE. By default, plots are interactive. If set FALSE, all plots in the report will be static.

add_download_button

Add download buttons for each plot or dataset.

save_output

Default FALSE. If TRUE, all outputs (tables and plots) of the analysis will be saved in a folder (EpiCompare_file).

output_filename

Default EpiCompare.html. If otherwise, the html report will be saved in the specified name.

output_timestamp

Default FALSE. If TRUE, date will be included in the file name.

output_dir

Path to where output HTML file should be saved.

display

After completion, automatically display the HTML report file in one of the following ways:

  • "browser" : Display the report in your default web browser.

  • "rsstudio" : Display the report in Rstudio.

  • NULL (default) : Do not display the report.

run_all

Convenience argument that enables all plots/features (without specifying each argument manually) by overriding the default values. Default: FALSE.

workers

Number of threads to parallelize across.

quiet

An option to suppress printing during rendering from knitr, pandoc command line and others. To only suppress printing of the last "Output created: " message, you can set rmarkdown.render.message to FALSE

error

If TRUE, the Rmarkdown report will continue to render even when some chunks encounter errors (default: FALSE). Passed to opts_chunk.

debug

Run in debug mode, where are messages and warnings are printed within the HTML report (default: FALSE).

Value

Path to one or more HTML report files.

Examples

### Load Data ###
data("encode_H3K27ac") # example dataset as GRanges object
data("CnT_H3K27ac") # example dataset as GRanges object
data("CnR_H3K27ac") # example dataset as GRanges object
data("CnT_H3K27ac_picard") # example Picard summary output
data("CnR_H3K27ac_picard") # example Picard summary output

#### Prepare Input ####
# create named list of peakfiles
peakfiles <- list(CnR=CnR_H3K27ac, CnT=CnT_H3K27ac)
# create named list of picard outputs
picard_files <- list(CnR=CnR_H3K27ac_picard, CnT=CnT_H3K27ac_picard)
# reference peak file
reference <- list("ENCODE" = encode_H3K27ac)

### Run EpiCompare ###
output_html <- EpiCompare(peakfiles = peakfiles,
           genome_build = list(peakfiles="hg19",
                               reference="hg19"),
           picard_files = picard_files,
           reference = reference,
           output_filename = "EpiCompare_test",
           output_dir = tempdir())
# utils::browseURL(output_html) 

neurogenomics/EpiCompare documentation built on Jan. 19, 2025, 8:11 a.m.