PeakSegJointSeveral: PeakSegJointSeveral

PeakSegJointSeveralR Documentation

PeakSegJointSeveral

Description

Run the PeakSegJoint heuristic segmentation algorithm with several different bin.factor values, keeping only the models with lowest Poisson loss for each peak size. This algorithm gives an approximate solution to the following multi-sample constrained maximum likelihood segmentation problem. If there are S samples total, we look for the most likely common peak in s\in{0, ..., S} samples. We solve the equivalent minimization problem using the Poisson loss seg.mean - count.data * log(seg.mean), from the first base to the last base of profiles. The optimization variables are the segment means, of which there can be either 1 value (no peak) or 3 values (peak) in each sample. If there are 3 segments then two constraints are applied: (1) the changes in mean must occur at the same position in each sample, and (2) the changes must be up and then down (mean1 < mean2 > mean3).

Usage

PeakSegJointSeveral(profiles, 
    bin.factors = 2:7)

Arguments

profiles

data.frame or list of them from ProfileList.

bin.factors

integer vector of optimization parameters >= 2. Larger values are slower. Using more values is slower since it tells the algorithm to search more of the model space, yielding solution which is closer to the global optimum.

Value

List of model fit results, which can be passed to ConvertModelList for easier interpretation.

Author(s)

Toby Dylan Hocking

Examples


library(PeakSegJoint)
data(H3K4me3.TDH.other.chunk8, envir=environment())
bf.vec <- c(2, 3, 5)
fit.list <-
  list(several=PeakSegJointSeveral(H3K4me3.TDH.other.chunk8, bf.vec))
for(bf in bf.vec){
  fit.list[[paste(bf)]] <-
    PeakSegJointHeuristicStep2(H3K4me3.TDH.other.chunk8, bf)
}
loss.list <- list()
segs.by.peaks.fit <- list()
for(fit.name in names(fit.list)){
  fit <- fit.list[[fit.name]]
  loss.list[[fit.name]] <- sapply(fit$models, "[[", "loss")
  converted <- ConvertModelList(fit)
  segs.by.peaks <- with(converted, split(segments, segments$peaks))
  for(peaks in names(segs.by.peaks)){
    model.segs <- segs.by.peaks[[peaks]]
    if(is.data.frame(model.segs)){
      segs.by.peaks.fit[[peaks]][[fit.name]] <-
        data.frame(fit.name, model.segs)
    }
  }
}
do.call(rbind, loss.list)

segs1 <- do.call(rbind, segs.by.peaks.fit[["10"]])
breaks1 <- subset(segs1, min(chromStart) < chromStart)
if(interactive() && require(ggplot2)){
  ggplot()+
    ggtitle(paste("PeakSegJointSeveral runs PeakSegJointHeuristic",
                  "and keeps only the most likely model"))+
    geom_step(aes(chromStart/1e3, count),
              color="grey50",
              data=H3K4me3.TDH.other.chunk8)+
    geom_vline(aes(xintercept=chromStart/1e3),
               data=breaks1,
               color="green",
               linetype="dashed")+
    geom_segment(aes(chromStart/1e3, mean,
                     xend=chromEnd/1e3, yend=mean),
                 size=1,
                 color="green",
                 data=segs1)+
    theme_bw()+
    theme(panel.margin=grid::unit(0, "cm"))+
    facet_grid(sample.id ~ fit.name, scales="free")
}

segs.by.peaks <- list()
for(peaks in 8:10){
  segs.by.peaks[[paste(peaks)]] <-
    data.frame(peaks, segs.by.peaks.fit[[paste(peaks)]][["several"]])
}
segs <- do.call(rbind, segs.by.peaks)
breaks <- subset(segs, min(chromStart) < chromStart)
if(interactive() && require(ggplot2)){
  ggplot()+
    ggtitle("PeakSegJoint models with 8-10 peaks")+
    geom_step(aes(chromStart/1e3, count),
              color="grey50",
              data=H3K4me3.TDH.other.chunk8)+
    geom_vline(aes(xintercept=chromStart/1e3),
               data=breaks,
               color="green",
               linetype="dashed")+
    geom_segment(aes(chromStart/1e3, mean,
                     xend=chromEnd/1e3, yend=mean),
                 size=1,
                 color="green",
                 data=segs)+
    theme_bw()+
    theme(panel.margin=grid::unit(0, "cm"))+
    facet_grid(sample.id ~ peaks, scales="free")
}


tdhock/PeakSegJoint documentation built on Jan. 27, 2024, 9 p.m.