overlap_score_summary_grid: Calculate and Summarize TopDom Overlap Scores Across...
In HenrikBengtsson/TopDomStudy: TopDom Study by Segal et al.

Description Usage Arguments Value Parallel processing

View source: R/overlap_score_summary_grid.R

Calculate and Summarize TopDom Overlap Scores Across Chromosomes, Bin Sizes, and Fractions

overlap_score_summary_grid(
  dataset,
  chromosomes,
  bin_sizes,
  rhos,
  reference_rhos = rep(1/2, times = length(rhos)),
  window_size = 5L,
  nsamples = 50L,
  weights = c("by_length", "uniform"),
  domain_length = NULL,
  verbose = FALSE
)

`dataset`	(character string) The name of the data set.
`chromosomes`	(character vector) Chromosomes to process.
`bin_sizes`	(numeric vector) The set of bin sizes (in bps) to process.
`rhos, reference_rhos`	(numeric vector) The set of fractions (in (0,0.5]) to process.
`window_size`	(integer) The TopDom windows size. Argument passed as `window.size` to `TopDom::TopDom()`.
`nsamples`	(integer) Number of random samples for produce.
`weights`	(character string) A character string specifying how overlap scores across domains should be weighted. Argument passed as is to `overlap_score_summary()`.
`domain_length`	(optional; character string or numeric vector of length two) If specified, controls how to filter out too short or too long TopDom domains. Argument passed as is to `overlap_score_summary()`.
`verbose`	(logical) If `TRUE`, verbose output is produced.

A three-dimensional character array of pathname names where the first dimension specify chromosomes, the second bin_sizes, and the third rhos (fractions).

The future framework is used to parallelize in three layers:

across (chromosome, bin size, fraction)
overlap_scores_partitions():
1. across a single chromosome (already subsetted above)
2. across nsamples random samples

An example of a future::plan() setup for parallelization on the local machine is:

 plan(list(
   chr_bin_rho = sequential,   ## across (chr, bin_size, rho)
   mono_chr    = sequential,   ## always a single chromosome
   samples     = multisession  ## across 1:nsamples
 ))

Another is,

 plan(list(
   chr_bin_rho = multisession, ## across (chr, bin_size, rho)
   mono_chr    = sequential,   ## always a single chromosome
   samples     = sequential    ## across 1:nsamples
 ))

For parallelization on a HPC cluster via a scheduler,

 hpc_scheduler <- tweak(future.batchtools::batchtools_torque,
                        resources = list(nodes="1:ppn=8", vmem="32gb"))
 plan(list(
   chr_bin_rho = hpc_scheduler,
   mono_chr    = sequential,
   samples     = multisession
 ))

HenrikBengtsson/TopDomStudy documentation built on May 14, 2021, 1:49 p.m.