callSummary: callSummary

View source: R/wrappers.R

callSummaryR Documentation

callSummary

Description

One of two main functions in the chromswitch package, this function detects a switch in chromatin state in one or more regions given ChIP-seq peak calls for one mark, executing the entire algorithm from preprocessing to evaluating the clustering results, using the summary strategy.

Usage

callSummary(query, metadata, peaks, mark, filter = FALSE,
  filter_columns = summarize_columns, filter_thresholds = NULL,
  summarize_columns = NULL, normalize_columns = summarize_columns,
  tail = 0.005, normalize = ifelse(is.null(normalize_columns) &&
  is.null(summarize_columns), FALSE, TRUE), fraction = TRUE, n = FALSE,
  heatmap = FALSE, titles = NULL, outdir = NULL,
  optimal_clusters = TRUE, estimate_state = FALSE, signal_col = NULL,
  test_condition = NULL, BPPARAM = bpparam())

Arguments

query

GRanges list containing one or more genomic regions of interest in which to call a switch. The output dataframe will contain one row per region in query.

metadata

A dataframe with at least two columns: "Sample" which stores the sample IDs, "Condition", which stores the biological condition labels of the samples

peaks

List of GRanges objects storing peak calls for each sample, where element names correspond to sample IDs

mark

Character specifying the histone mark or ChIP-target, for example, "H3K4me3"

filter

(Optional) logical value, filter peaks based on thresholds on peak statistics? Default: FALSE. The filter step is described in filterPeaks.

filter_columns

If filter is TRUE, a chracter vector corresponding to names of columns in the peak metadata by which to filter peaks. If filter is FALSE, not used.

filter_thresholds

If filter is TRUE, a numeric vector corresponding to lower cutoffs applied to metadata columns in order to filter peaks. Provide one per column specified in filter_columns, in the same order. If filter is FALSE, not used.

summarize_columns

Character vector of column names on which to compute summary statistics during feature matrix construction. These statistics become the features of the matrix.

normalize_columns

If normalize is TRUE, a character vector corresponding to names of columns in the peak metadata to normalize genome-wide for each sample. If normalize is FALSE, not used.

tail

(Optional) if normalize is TRUE, specifies the fraction of extreme values in each tail to bound during normalization. More details at normalizePeaks.

normalize

(Optional) logical value, normalize peak statistics genome-wide for each sample? Default: TRUE if summarize_columns or normalize_columns is specified, FALSE, otherwise.

fraction

(Optional) Logical value, during feature matrix construction, compute the fraction of the region overlapped by peaks? Default: TRUE

n

(Optional) Logical value, during feature matrix construction, compute the number of peaks in the region? Default: FALSE

heatmap

(Optional) Logical value, plot the heatmap corresponding to the hierarchical clustering result? Default: FALSE

titles

(Optional) if heatmap is TRUE, a character vector of the same length as query, specifying the title to use when plotting each heatmap (e.g. a gene name), also reused as the prefix of the name of the file where the heatmap is saved. By default, the title is the genomic coordinates of the region in the form "chrN:start-end"

outdir

(Optional) if heatmap is TRUE, the name of the directory where heatmaps should be saved

optimal_clusters

(Optional) Logical value indicate whether to cluster samples into two groups, or to find the optimal clustering solution by choosing the set of clusters which maximizes the Average Silhouette width. Default: TRUE

estimate_state

(Optional) Logical value indicating whether to include a column "state" in the output specifying the estimated chromatin state of a test condition. The state will be on of "ON", "OFF", or NA, where the latter results if a binary switch between the conditions is unclear. Default: FALSE.

signal_col

(Optional) If estimate_state is TRUE, string specifying the name of the column in the original peak files which corresponds to the level of enrichment in the region, e.g. fold change

test_condition

(Optional) If estimate_state is TRUE, string specifying one of the two biological condtions in metadata$Condition for which to estimate chromatin state.

BPPARAM

(Optional) instance of BiocParallel:BiocParallelParam used to determine the back-end used for parallel computations when performing the analysis on more than one region.

Details

This strategy constructs a sample-by-feature matrix to use as input for hierarchical clustering by computing, for each sample, a vector of summary statistics based on that sample's peaks in the query region. The summary statistics are generally based on the enrichment statistics associated with each peak as returned by the peak calling too, which might include, for example, a p value and fold change.

Value

Data frame with one row per region in query. Contains the coordinates of the region, the number of inferred clusters, the computed cluster validity statistics, and the cluster assignment for each sample.

Examples


samples <- c("E068", "E071", "E074", "E101", "E102", "E110")
bedfiles <- system.file("extdata", paste0(samples, ".H3K4me3.bed"),
package = "chromswitch")
Conditions <- c(rep("Brain", 3), rep("Other", 3))

metadata <- data.frame(Sample = samples,
    H3K4me3 = bedfiles,
    Condition = Conditions,
    stringsAsFactors = FALSE)

regions <- GRanges(seqnames = c("chr19", "chr19"),
    ranges = IRanges(start = c(54924104, 54874318),
                                end = c(54929104, 54877536)))

callSummary(query = regions,
                metadata = metadata,
                peaks = H3K4me3,
                normalize_columns = c("qValue", "pValue", "signalValue"),
                mark = "H3K4me3",
                summarize_columns = c("pValue", "qValue", "signalValue"),
                heatmap = FALSE,
                BPPARAM = BiocParallel::SerialParam())


sjessa/chromswitch documentation built on Feb. 4, 2024, 2:04 a.m.