overlap_score_summary_grid: Calculate and Summarize TopDom Overlap Scores Across...

Description Usage Arguments Value Parallel processing

View source: R/overlap_score_summary_grid.R

Description

Calculate and Summarize TopDom Overlap Scores Across Chromosomes, Bin Sizes, and Fractions

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
overlap_score_summary_grid(
  dataset,
  chromosomes,
  bin_sizes,
  rhos,
  reference_rhos = rep(1/2, times = length(rhos)),
  window_size = 5L,
  nsamples = 50L,
  weights = c("by_length", "uniform"),
  domain_length = NULL,
  verbose = FALSE
)

Arguments

dataset

(character string) The name of the data set.

chromosomes

(character vector) Chromosomes to process.

bin_sizes

(numeric vector) The set of bin sizes (in bps) to process.

rhos, reference_rhos

(numeric vector) The set of fractions (in (0,0.5]) to process.

window_size

(integer) The TopDom windows size. Argument passed as window.size to TopDom::TopDom().

nsamples

(integer) Number of random samples for produce.

weights

(character string) A character string specifying how overlap scores across domains should be weighted. Argument passed as is to overlap_score_summary().

domain_length

(optional; character string or numeric vector of length two) If specified, controls how to filter out too short or too long TopDom domains. Argument passed as is to overlap_score_summary().

verbose

(logical) If TRUE, verbose output is produced.

Value

A three-dimensional character array of pathname names where the first dimension specify chromosomes, the second bin_sizes, and the third rhos (fractions).

Parallel processing

The future framework is used to parallelize in three layers:

  1. across (chromosome, bin size, fraction)

  2. overlap_scores_partitions():

    1. across a single chromosome (already subsetted above)

    2. across nsamples random samples

An example of a future::plan() setup for parallelization on the local machine is:

 plan(list(
   chr_bin_rho = sequential,   ## across (chr, bin_size, rho)
   mono_chr    = sequential,   ## always a single chromosome
   samples     = multisession  ## across 1:nsamples
 ))

Another is,

 plan(list(
   chr_bin_rho = multisession, ## across (chr, bin_size, rho)
   mono_chr    = sequential,   ## always a single chromosome
   samples     = sequential    ## across 1:nsamples
 ))

For parallelization on a HPC cluster via a scheduler,

 hpc_scheduler <- tweak(future.batchtools::batchtools_torque,
                        resources = list(nodes="1:ppn=8", vmem="32gb"))
 plan(list(
   chr_bin_rho = hpc_scheduler,
   mono_chr    = sequential,
   samples     = multisession
 ))

HenrikBengtsson/TopDomStudy documentation built on May 14, 2021, 1:49 p.m.