segmentation: Multi-resolution segmentation

View source: R/segmentation.R

segmentationR Documentation

Multi-resolution segmentation

Description

Compute a multi-resolution segmentation based on locally optimal enrichment scores and internal consistency of the segmentation tree.

Usage

segmentation(Yi, name, wmax = 0, wmin = 3, gamma = 1, Tw = 0)

Arguments

Yi

vector of statistical scores produced by the enrichmentScore function.

name

prefix used to generate output file names.

wmax

maximal window size to be scanned in number of Yi values. The default is length(Yi).

wmin

minimum size of segmented regions in number of Yi values. The default is 3 consecutive Yi values.

gamma

numeric value ranging from ]0;1] and acting as a stringency factor which influences both the minimum size and enrichment score of segmented regions. The segmentation stringency gets higher when gamma value gets lower. Default value is 1.0 (no effect).

Tw

numeric value ranging from ]-Inf;0] and indicating the minimal enrichment score accepted for segmented regions. This thresholding becomes more stringent when Tw value gets lower. Default value is 0 (no threshold).

Value

segmentation returns a list object with the following attributes:

time.optimization

computation time for the optimization procedure

time.MRT.Analysis

computation time for the domain fusion procedure

n.segments

total number of locally optimal segments after optimization

n.domains

total number of domains after domain fusion

n.maxresolution

number of maximum resolution domains

n.maxscale

number of maximum scale domains

file.segments

result file for the optimization procedure

file.domains

result file for the domain fusion procedure

file.maxresolution

result file for maximum resolution domains

file.maxscale

result file for maximum scale domains

Pw.min

best statistical score (pseudo p-values from combined rank-based scores) for each scanned window size

Output files

  • Common structure

    All output files are tab delimited text files including two header lines.

     line 1 = analysis parameters line 2 = column names following
    lines = data table 
  • file.segments

    This file contains the optimal segment data, resulting from the the optimization procedure, and defined by columns (i, w, Piw) as follows:

     i = index of the last Yi value within the optimal segment w =
    size in number of consecutive Yi values Piw = statistical score 
  • file.domains, file.maxresolution, file.maxscale

    These files contain the segmented domain data, resulting from the domain fusion procedure, and defined by columns (id, container, start, end, wmin, wmax, P, i, w) as follows:

       id        = unique identifier for each domain
       container = identifier of parent domain, 0 meaning no parent
       start     = index of the first Yi value within domain
       end       = index of the last Yi value within domain
       wmin      = minimum size of included optimal segments, in number of Yi values
       wmax      = maximum size of included optimal segments, in number of Yi values
       P         = statistical score of the locally optimal segment
       i         = index of the last Yi value within the locally optimal segment
       w         = size of the locally optimal segment in number of consecutive Yi values
    

Author(s)

Benjamin Leblanc

References

Leblanc B., Comet I., Bantignies F., and Cavalli G., Chromosome Conformation Capture on Chip (4C): data processing. Book chapter in Polycomb Group Proteins: Methods and Protocols. Lanzuolo C., Bodega B. editors, Methods in Molecular Biology (2016). http://dx.doi.org/10.1007/978-1-4939-6380-5_21

See Also

calc.Qi, enrichmentScore, domainogram, plotOptimalSegments, plotDomains

Examples

# Simulate enrichment signal
n <- 2000
Mi <- rep(0, n)
Mi <- Mi + dnorm(1:n, 2.5*n/20,  n/40) + dnorm(1:n, 4*n/20,  50)
Mi <- Mi + 4 * dnorm(1:n, 5*n/10, n/10)
Mi <- Mi + dnorm(1:n, 16*n/20,  n/40) + dnorm(1:n, 17.5*n/20,  50)
Mi <- (Mi/max(Mi))^4 + rnorm(n)/4

# Compute enrichment scores
Yi <- enrichmentScore(Mi)

# Multi-resolution segmentation
seg.c <- segmentation(Yi, name="MRA_demo", wmin=20)

# Load segmentation results
opts <- read.delim(seg.c$file.segments, stringsAsFactors=F, skip=1)
doms <- read.delim(seg.c$file.domains, stringsAsFactors=F)
doms.mr <- read.delim(seg.c$file.maxresolution, stringsAsFactors=F)
doms.ms <- read.delim(seg.c$file.maxscale, stringsAsFactors=F)

# Visualization coordinates
x.s <- 1:n - 0.5
x.e <- 1:n + 0.5
w2y <- wSize2yAxis(n, logscale=T)

layout(matrix(1:2, 2, 1), heights=c(3,1)/4)
par(mar=c(3, 4, 1, 2)) # default bottom, left, top, right = c(5, 4, 4, 2)

# Plot domainogram
domainogram(Yi, x.s, x.e, w2y)
plot(Mi, type='l', xaxs = 'i')

# Visualize segmentation results
plotOptimalSegments(opts, x.s, x.e, w2y, col="black")
plot(Mi, type='l', xaxs = 'i')

# Visualize multi-resolution domains
plotDomains(doms, x.s, x.e, w2y, col=rgb(0,0,0,0.5), border=rgb(0,0,0,0))
# Visualize max. resolution and max. scale domains
plotDomains(doms.mr, x.s, x.e, w2y, col=rgb(0,1,0,0.5), border=rgb(0,0,0,0), lwd=2, lty=1, add=T)
plotDomains(doms.ms, x.s, x.e, w2y, col=rgb(1,0,0,0.5), border=rgb(0,0,0,0), lwd=2, lty=1, add=T)
legend("topright", c("Max. resolution", "Max. scale", "Both"), fill=c("green", "red", "chocolate"), bty='n')
plot(Mi, type='l', xaxs = 'i')

benja0x40/MRA.TA documentation built on March 13, 2023, 5:15 a.m.