buildCIM: Builds Chromosomal Interactions Maps

Description Usage Arguments Value Note Author(s) References See Also

Description

This function takes mapped positions of fragments pairs (Hi-C data) in a given format (supported formats are "nodup", "maq" or "sam") and genome coordinates of all relevant regions (segmentation)and writes pairwise contact maps for all chromosome pairs. A cell M[i,j] in the pairwise matrix generetaed for a pair of chromosomes, chromosomeA and chromosomeB takes the values of the number of interactions between region i in chromosome A and region j in chromosome B (B and A may be the same chromosome). To improve processing times, this function calls a python executable. Thus, users should verify python (> 2.6) is installed and added to their PATH.

Usage

1
2
buildCIM(HiCFile, segFile, format, outputPrefix, resolution, header = FALSE, 
inclusive = FALSE, verbose = TRUE, combineToSingle = TRUE)

Arguments

HiCFile

The name of the Hi-C file.

See for example: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM455134

segFile

The name of the segmentation file. The file provides the genomic coordinates of each region. It should be a tab delimited file with the following columns: chormosome, start, end, giving the chromosome name, start and end positions of each region.

format

The format of the Hi-C file, taking one of the following values: "nodup", "maq" or "sam"

outputPrefix

The prefix of the output files generated by this function. Each file is appended with the name of the 2 chromosomes, that correspond to the output contact map.

resolution

An integer value specifying the resolution of the given segmentation, if applicable. Specifically, if the segmentation file defines regions of the same size (for example: 1000000) this variable should be set accordingly. Otherwise it should be set to -1. Note that specifying the resolution greatly improves processing times.

header

optional: a boolean specifying whether the segmentation file includes a header or not. Set to FALSE by default

inclusive

optional: a boolean specifying whether the segmentation is inclusive. (i.e. whether the end position of one region overlaps with the start position of the next region). Set to FALSE by default.

verbose

optional: a boolean specifying whether to report on the progress of the CIM build. Set to TRUE by default.

combineToSingle

optional: a boolean specifying whether to also combine all the pairwise matrices into a single matrix and write it to a file.If set to TRUE, an additional file will be written, depending on available memory. Set to TRUE by default.

Value

This function generates a file for every pairwise chromosomal interaction map from the given input. No value is returned.

Note

Users should note that for large Hi-C files (> 10Gb), the pre-processing time is typically long (30-60 minutes).In order to generate Hi-C mapped positions given raw fragments pairs users should refer to related pipelines such as the HiCuP pipeline (http://www.bioinformatics.babraham.ac.uk/projects/hicup/). Additionally, different Hi-C data sets (raw fragment pairs and mapped positions) are publicly available from the Gene Expression Omnibus (GEO): http://www.ncbi.nlm.nih.gov/geo/

Author(s)

Yoli Shavit

References

http://www.cl.cam.ac.uk/~ys388/chromoR/

See Also

See Also as correctCIM, correctPairCIM


chromoR documentation built on May 2, 2019, 2:05 p.m.