chunk.bin: Split the bins in chunks for parallel normalization

Description Usage Arguments Value Author(s)

View source: R/chunk.bin.R

Description

Split the bins into big and small chunks. A big chunk represents all the bins used to normalize bins of this chunk. Small chunk represents the bins to analyze by one job on the cluster. While the size of the small chunks is not important and can be adjusted to fit the cluster, the big chunk size will impact on the efficiency of the normalization (the bigger the better).

Usage

1
2
chunk.bin(bins.df, bg.chunk.size = 1e+05, sm.chunk.size = 1000,
  large.chr.chunks = FALSE)

Arguments

bins.df

a data.frame with the bins definition (one row per bin). E.g. created by 'fragment.genome.hg19'.

bg.chunk.size

the number of bins in a big chunk.

sm.chunk.size

the number of bins in a small chunk.

large.chr.chunks

should the big chunks be made of a few large genomic sub-regions ? Default is false. Normalization is faster (but a bit less efficient) than when using random bins. Recommended when dealing with a large number of bins.

Value

an updated data.frame with new columns 'sm.chunk', 'bg.chunk' and 'bin' with the small chunk ID, big chunk ID and bin definition.

Author(s)

Jean Monlong


jmonlong/PopSV documentation built on Sept. 15, 2019, 9:29 p.m.