scanBinsizes: Find the best bin size for a given dataset

Description Usage Arguments Details Value Author(s)

View source: R/scanBinsizes.R

Description

Use simulations to find the best bin size among a set of input files. There is no guarantee that the bin size will be the best for your data, since it is only "best" in terms of fewest miscalls for simulated data. However, it can give you a hint what bin size to choose.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
scanBinsizes(
  files.binned,
  outputfolder,
  chromosomes = "chr10",
  eps = 0.01,
  max.iter = 100,
  max.time = 300,
  repetitions = 3,
  plot.progress = FALSE
)

Arguments

files.binned

A vector with files that contain binned.data in different bin sizes.

outputfolder

Name of the folder where all files will be written to.

chromosomes

A vector of chromosomes to use for the simulation.

eps

Convergence threshold for the Baum-Welch algorithm.

max.iter

The maximum number of iterations for the Baum-Welch algorithm. The default -1 is no limit.

max.time

The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default -1 is no limit.

repetitions

Number of repetitions for each simulation.

plot.progress

If TRUE, the plot will be updated each time a simulation has finished. If FALSE, the plot will be returned only at the end.

Details

The function first runs callPeaksUnivariate on the given binned.data files. From the estimated parameters it generates simulated data and calls the peaks on this simulated data. Because the data is simulated, the fraction of miscalls can be precisely calculated.

Value

A ggplot object with a bar plot of the number of miscalls dependent on the bin size.

Author(s)

Aaron Taudt


ataudt/chromstaR documentation built on Dec. 26, 2021, 12:07 a.m.