Brick_local_score_differentiator: Do TAD Calls with Local Score Differentiator on a Hi-C matrix

View source: R/LSD_functions.R

Brick_local_score_differentiatorR Documentation

Do TAD Calls with Local Score Differentiator on a Hi-C matrix

Description

Local_score_differentiator calls topologically associated domains on Hi-C matrices. Local score differentiator at the most fundamental level is a change point detector, which detects change points in the directionality index using various thresholds defined on a local directionality index distributions. The directionality index (DI) is calculated as defined by Dixon et al., 2012 Nature. Next, the difference of DI is calculated between neighbouring bins to get the change in DI distribution in each bin. When a DI value goes from a highly negative value to a highly positive one as expected to occur at domain boundaries, the ensuing DI difference distribution becomes a very flat distribution interjected by very large peaks signifying regions where such a change may take place. We use two difference vectors, one is the difference vector between a bin and its adjacent downstream bin and another is the difference between a bin and its adjacent upstream bin. Using these vectors, and the original directionality index, we define domain borders as outliers.

Usage

Brick_local_score_differentiator(
    Brick,
    chrs = NULL,
    resolution = NA,
    all_resolutions = FALSE,
    min_sum = -1,
    di_window = 200L,
    lookup_window = 200L,
    tukeys_constant = 1.5,
    strict = TRUE,
    fill_gaps = TRUE,
    ignore_sparse = TRUE,
    sparsity_threshold = 0.8,
    remove_empty = NULL,
    chunk_size = 500,
    force_retrieve = TRUE
)

Arguments

Brick

Required. A string specifying the path to the Brick store created with Create_many_Brick.

chrs

Optional. Default NULL If present, only TAD calls for elements in chrs will be done.

resolution

Optional. Default NA When an object of class BrickContainer is provided, resolution defines the resolution on which the function is executed

all_resolutions

Optional. Default FALSE If resolution is not defined and all_resolutions is TRUE, the resolution parameter will be ignored and the function is executed on all files listed in the Brick container

min_sum

Optional. Default -1 Process bins in the matrix with row.sums greater than min_sum.

di_window

Optional. Default 200 Use di_window to define the directionality index.

lookup_window

Optional. Default 200 Use lookup_window local window to call borders. At smaller di_window values we recommend setting this to 2*di_window

tukeys_constant

Optional. Default 1.5 tukeys_constant*IQR (inter-quartile range) defines the lower and upper fence values.

strict

Optional. Default TRUE If TRUE, strict creates an additional filter on the directionality index requiring it to be either greater than or less than 0 on the right tail or left tail respectively.

fill_gaps

Optional. Default TRUE If TRUE, this will affect the TAD stiching process. All Border starts are stiched to the next downstream border ends. Therefore, at times border ends remain unassociated to a border start. These border ends are stiched to the adjacent downstream bin from their upstream border end when fill_gaps is true.

TADs inferred in this way will be annotated with two metadata columns in the GRanges object. gap.fill will hold a value of 1 and level will hold a value 1. TADs which were not filled in will hold a gap.fill value of 0 and a level value of 2.

ignore_sparse

Optional. Default TRUE If TRUE, a matrix which has been defined as sparse during the matrix loading process will be treated as a dense matrix. The sparsity_threshold filter will not be applied. Please note, that if a matrix is defined as sparse and fill_gaps is TRUE, fill_gaps will be turned off.

sparsity_threshold

Optional. Default 0.8 Sparsity threshold relates to the sparsity index, which is computed as the number of non-zero bins at a certain distance from the diagonal. If a matrix is sparse and ignore_sparse is FALSE, bins which have a sparsity index value below this threshold will be discarded from DI computation.

remove_empty

Not implemented. After implementation, this will ensure that the presence of centromeric regions is accounted for.

chunk_size

Optional. Default 500 The size of the matrix chunk to process. This value should be larger than 2x di_window.

force_retrieve

Optional. Default TRUE If TRUE, this will force the retrieval of a matrix chunk even when the retrieval includes interaction points which were not loaded into a Brick store (larger chunks). Please note, that this does not mean that DI can be computed at distances larger than max distance. Rather, this is meant to aid faster computation.

Details

To define an outlier, fences are first defined. The fences are defined using tukeys_constant x inter-quartile range of the directionality index. The upper fence used for detecting domain starts is the 75th quartile + (IQR x tukeys_constant), while the lower fence is the 25th quartile - (IQR x tukeys_constant). For domain starts the DI difference must be greater than or equal to the upper fence, it must be greater than the DI and the DI must be a finite real value. If strict is TRUE, DI will also be required to be greater than 0. Similarly, for domain ends the DI difference must be lower than or equal to the lower fence, it must be lower than the DI and the DI must be a finite real value. If strict is TRUE, DI will also be required to be lower than 0.

After defining outliers, each domain start will be associated to its nearest downstream domain end. If fill_gaps is defined as TRUE and there are domain ends which remain unassociated to a domain start, These domain ends will be associated to the bin adjacent to their nearest upstream domain end. This associations will be marked by metadata columns, gap.fill= 1 and level = 1.

This function provides the capability to call very accurante TAD definitions in a very fast way.

Value

A ranges object containing domain definitions. The starts and ends of the ranges coincide with the starts and ends of their contained bins from the bintable.

Examples

Bintable.path <- system.file(file.path("extdata", "Bintable_100kb.bins"), 
package = "HiCBricks")

out_dir <- file.path(tempdir(), "lsd_test")
dir.create(out_dir)

My_BrickContainer <- Create_many_Bricks(BinTable = Bintable.path, 
    bin_delim = " ", output_directory = out_dir, file_prefix = "Test",
    experiment_name = "Vignette Test", resolution = 100000,
    remove_existing = TRUE)

Matrix_file <- system.file(file.path("extdata", 
"Sexton2012_yaffetanay_CisTrans_100000_corrected_chr3R.txt.gz"), 
package = "HiCBricks")

Brick_load_matrix(Brick = My_BrickContainer, chr1 = "chr3R", 
chr2 = "chr3R", matrix_file = Matrix_file, delim = " ",
remove_prior = TRUE, resolution = 100000)

TAD_ranges <- Brick_local_score_differentiator(Brick = My_BrickContainer, 
chrs = "chr3R", resolution = 100000, di_window = 10, lookup_window = 30, 
strict = TRUE, fill_gaps = TRUE, chunk_size = 500)

koustav-pal/HiCBricks documentation built on Oct. 25, 2022, 12:06 a.m.