subsetSummary: Compute summaries for cumulative subsets of a short-read data...

View source: R/subsets.R

subsetSummaryR Documentation

Compute summaries for cumulative subsets of a short-read data set.

Description

THIS FUNCTION IS DEFUNCT!

Divides a short-read dataset into several subsets, and computes various summaries cumulatively. The goal is to study the characteristics of the data as a function of sample size.

Usage

subsetSummary(x, chr, nstep, props = seq(0.1, 1, 0.1),
              chromlens = seqlengths(x), fg.cutoff = 6, seqLen = 200,
              fdr.cutoff = 0.001, use.fdr = FALSE, resample = TRUE,
              islands = TRUE, verbose = getOption("verbose"))

Arguments

x

A "GRanges" object representing alignment locations at the sample level.

chr

The chromosome for which the summaries are to be obtained. Must specify a valid element of x

nstep

The number of maps in each increment for the full dataset (not per-chromosome). This will be translated to a per-chromosome number proportionally.

props

Alternatively, an increasing sequence of proportions determining the size of each subset. Overrides nstep.

chromlens

A named vector of per-chromosome lengths, typically the result of seqlengths.

fg.cutoff

The coverage depth above which a region would be considered foreground.

seqLen

The number of bases to which to extend each read before computing coverage.

resample

Logical; whether to randomly reorder the reads before dividing them up into subsets. Useful to remove potential order effects (for example, if data from two lanes were combined to produce x).

fdr.cutoff

The maximum false discovery rate for a region that is considered to be foreground.

use.fdr

Whether to use the FDR detected peaks when calling foreground and background.

islands

Logical. If TRUE, the whole island would be considered foreground if the maximum depth equals or exceeds fg.cutoff. If FALSE, only the region above the cutoff would be considered foreground.

verbose

logical controlling whether progress information will be shown during computation (which is potentially long-running).

Value

A data frame with various per-subset summaries.

Note

This function should be considered preliminary, in that it might change significantly or simply be removed in a subsequent version. If you like it the way it is, please notify the maintainer.

Author(s)

Deepayan Sarkar, Michael Lawrence


Bioconductor/chipseq documentation built on Nov. 2, 2024, 7:23 a.m.