knitr::opts_chunk$set(
  collapse = TRUE,
  warning=FALSE, message=FALSE, 
  comment = "#>"
)
library(signals)

Background

This vignette demonstrates how to use a beta-binomial model instead of a binomial model for the HMM emission model.

Inference

data("haplotypes")
data("CNbins")
haplotypes <- format_haplotypes_dlp(haplotypes, CNbins)
hscn <- callHaplotypeSpecificCN(CNbins, haplotypes, likelihood = "binomial")

print(hscn)

hscn_bb <- callHaplotypeSpecificCN(CNbins, haplotypes, likelihood = "betabinomial")

print(hscn_bb)

We see that the model inferred a modest degree of overdispersion in the data, but that this was siginificantly different from 0 dispersion (Tarones Z-score is high).

Below I'll plot the heatmaps for the two output, which in this case are very similar.

Binomial model

plotHeatmap(hscn, plotcol = "state_BAF", plottree = FALSE, spacer_cols = 15)

Beta-Binomial model

plotHeatmap(hscn_bb, plotcol = "state_BAF", plottree = FALSE, spacer_cols = 15)

QC plot

To interrogate the difference between the binomial and beta-binomial models we can plot the BAF and overlay the two models.

plotBBfit(hscn_bb)

We can check how similar the results are by comparing the two dataframes. We find for this data that the results are very similar.

print(dim(hscn$data))
print(dim(hscn_bb$data))
all.equal(orderdf(hscn$data), orderdf(hscn_bb$data))

Another thing we can check is the total number of segments identified by the HMM. Using the beta-binomial model should reduce the influence of any noisy regions and thus we might expect fewer segments. We do observe fewer segments in the beta-binomial model.

segs <- create_segments(hscn$data, field = "state_AS_phased")

segs_bb <- create_segments(hscn_bb$data, field = "state_AS_phased")

print(dim(segs))
print(dim(segs_bb))

Summary

With this dataset, the difference between the nbinomial model and beta-binomial model is minimal but produces a statistically superior fit, in lower quality datasets with for example lower depth of coverage the beta-binomial model may may provide more noticeable differences.



shahcompbio/signals documentation built on Jan. 11, 2025, 2:20 a.m.