create.hic.table: Create hic.table object from a sparse upper triangular Hi-C...

View source: R/create_hic_table.R

create.hic.tableR Documentation

Create hic.table object from a sparse upper triangular Hi-C matrix

Description

Create hic.table object from a sparse upper triangular Hi-C matrix

Usage

create.hic.table(
  sparse.mat1,
  sparse.mat2,
  chr = NA,
  scale = TRUE,
  include.zeros = FALSE,
  subset.dist = NA,
  subset.index = NA,
  exclude.regions = NA,
  exclude.overlap = 0.2
)

Arguments

sparse.mat1

Required, sparse upper triangular Hi-C matrix, 7 column BEDPE format of the upper triangle of the matrix, OR InteractionSet object with the genomic ranges of the interacting regions for the upper triangle of the Hi-C matrix and a single metadata column containing the interaction frequencies for each interacting pair for the first dataset you wish to jointly normalize.

sparse.mat2

Required, sparse upper triangular Hi-C matrix, 7 column BEDPE format of the upper triangle of the matrix, OR InteractionSet object with the genomic ranges of the interacting regions for the upper triangle of the Hi-C matrix and a single metadata column containing the interaction frequencies for each interacting pair for the second dataset you wish to jointly normalize.

chr

The chromosome name for the matrices being entered i.e 'chr1' or 'chrX'. Only needed if using sparse upper triangular matrix format. If using BEDPE format leave set to NA.

scale

Logical, should scaling be applied to the matrices to adjust for total read counts. If TRUE the IFs of the second sparse matrix will be adjusted as follows: IF2_scaled = IF2 / (sum(IF2)/sum(IF1)).

include.zeros

Logical, If set to TRUE the function will include pairwise interactions where one of the interaction frequencies is 0.

subset.dist

Should the matrix be subset to only include interactions up to a user specified matrix unit distance? i.e. to only include the cells of the matrix which are at a unit distance less than or equal to 100 set subset.dist = 100. Subsetting the matrix by distance will cut out any interactions occuring at a unit distance greater than the specified value. This could be used to speed up computation time or if there is only interest in the interactions occuring below a specific distance in the matrix. Warning: If you subset by distance do NOT to transform the subsetted hic.table into a full matrix using 'sparse2full'. If you plan on trasforming the matrix to a full contact matrix use subset.index instead.

subset.index

Should the matrix be subset by a user specified distance? Input as a vector of 4 numbers (i.start, i.end, j.start, j.end). i.e. to only include a subset of the matrix with row numbers 20 <= i <= 40 and column numbers 30 <= j <= 50 set as subset.index = c(20, 40, 30, 50). This can be used to speed up computation time if only a subset of the matrix is of interest. The indices used here correspond to the indices of the full Hi-C contact matrix. The 'sparse2full' function can be used to view the full contact matrix and make a decision about subsetting based on index.

exclude.regions

A data.frame or genomic ranges object in the form of chr start end. Regions contained in the object will be removed from the hic.table object. Could be useful for excluding regions with a known CNV, blacklist regions, or some other a priori known difference.

exclude.overlap

The proportion of overlap required to exclude a region. Defaults to 0.2, indicating 20% or more overlap will be enough for exclusion. To exclude any amount of overlap set to 0. If set to 1, only a 100% overlap with an excluded regions will result in exclusion.

Details

This function is used to transform two sparse upper triangular Hi-C matrices into an object usable in the hic_loess function. Sparse upper triangular Hi-C matrix format is typical of the Hi-C data available from the Aiden Lab https://www.aidenlab.org/. If you have a full Hi-C contact matrix, first transform it to sparse upper triangular format using the full2sparse function. Sparse matrices should have 3 columns in the following order: Start location of region 1, Start location of region 2, Interaction Frequency. Matrices in 7 column BEDPE format should have 7 columns in the following order: Chromosome name of the first region, Start location of first region, End location of first region, Chromosome name of the second region, Start location of the second region, End location of the second region, Interaction Frequency. Please enter either two sparse matrices or two matrices in 7 column BEDPE format or two InteractionSet objects; do not mix and match.

Value

A hic.table object.

Examples

# Create hic.table object using included Hi-C data in sparse upper
# triangular matrix format
data('HMEC.chr22')
data('NHEK.chr22')
hic.table <- create.hic.table(HMEC.chr22, NHEK.chr22, chr = 'chr22')
# View result
hic.table

dozmorovlab/HiCcompare documentation built on June 30, 2023, 3:09 a.m.