
The HiCAGE (Hi-C Annotation and Graphics Ensemble) package offers users the ability to annotate and visualize 3C-based genomic data at whole-genome scale. This document describes the functionalities of the package and provides users with detailed descriptions on its use.

Package Overview

HiCAGE has a variety of features designed for efficient annotation, analysis, and visualization of interacting chromatin regions.

HiCAGE Environment Setup

HiCAGE relies on the following dependencies:


Input Data

HiCAGE is designed to handle tab-delimited data as input. 3C-based genomic data, segmentation data, and RNA-seq data can all be input. RNA-seq data, however, is optional. Files can be in txt, tsv, bed, or other formats. HiCAGE is written to require the least amount of data manipulation prior to loading files by allowing the user to specify the columns containing the necessary data from each data file. Default column selection is setup to handle common data layouts.


Example format for initial data input:

hic_chr20 <- system.file("extdata", "hic_chr20.txt", package = "HiCAGE")
segment_chr20 <- system.file("extdata", "segment_chr20.bed", package = "HiCAGE")
rna_chr20 <- system.file("extdata", "rna_chr20.tsv", package = "HiCAGE")

example <- overlap(hicfile = hic_chr20, 
                   segmentfile = segment_chr20, 
                   rnafile = rna_chr20,
                   bio_mart = "ensembl",
                   martset = "hsapiens_gene_ensembl",
                   webhost = "")

Column Selection (Optional based on data format)

Hi-C or 3C-based Data Column Selection

The overlap function defaults to selecting the appropriate columns using the format found in the example HiCCuPs looplist Hi-C file. If columns do not match this order or if the Hi-C file contains additional unneeded columns, the hic.columns argument in the overlap function can be used to select the proper columns.

Hi-C data files need to contain the following columns in order (specified using hic.columns argument in the overlap function):

overlap(hicfile = hic_chr20, 
        segmentfile = segment_chr20, 
        rnafile = rna_chr20,
        hic.columns = c(1:6, 8))

Chrom1 | Chrom1Start | Chrom1End | Chrom2 | Chrom2Start | Chrom2End | Score ------ | ----------- | --------- | ------ | ----------- | --------- | ----- "V1" | "V2" | "V3" | "V4" | "V5" | "V6" | "V8" 20 | 13300000 | 13310000 | 20 | 13520000 | 13530000 | 35 20 | 17520000 | 17530000 | 20 | 17590000 | 17600000 | 71

Segmentation File Column Selection

Segmentation data files need to contain the columns in the following example:

Chrom | ChromStart | ChromEnd | Mark | Score ----- | ---------- | -------- | ---- | ----- "V1" | "V2" | "V3" | "V4" | "V5" chr20 | 62218 | 62675 | EWR | 0.0000 chr20 | 117995 | 118433 | HET | 781.1476

StateHub/StatePaintR segmentation files all use the above format. However, users can still select columns containing the necessary information using segment.column in the overlap function:

overlap(hicfile = hic_chr20, 
        segmentfile = segment_chr20, 
        rnafile = rna_chr20,
        segment.columns = c(1:5))

Manual state prioritization

HiCAGE uses Segmentation Score found in the segmentation file to prioritze state calls in the genomic regions defined in the Hi-C file. Alternatively, users can manually select state prioritization, if scores do not exit or are not desired, using the manual.priority argument in the overlap function. In this instance, users select columns in the segmentation file containing Chrom, ChromStart, ChromEnd, and State using the segment.columns argument. Then define priority by entering state calls in the manual.priority argument in order of highest priority to lowest priority.

overlap(hicfile = hic_chr20, 
        segmentfile = segment_chr20, 
        rnafile = rna_chr20,
        segment.columns = c(1:4)
        manual.priority = c("PAR", "PPR", "EAR", "EPR", "HET"))

Pruning state calls

HiCAGE uses Segmentation Score found in the segmentation file to prioritze state calls in the genomic regions defined in the Hi-C file. However, occasionally two states in one region may have identical segementation scores. In this instance, the genomic region will be annontated with both calls in the final datatable, creating duplicate rows of interacting regions with all the top state calls. If this is not desired, prune.priority can be set in overlap to prioritize states after prioritizing using segmentation scores. A concatenated list is entered in the form c("PAR", "EAR", "AR", "PPR", "EPR", "TRS", "HET") ordered from highest priority to lowest priority. If manual.priority is set, this argument will have no effect.

overlap(hicfile = hic_chr20, 
        segmentfile = segment_chr20, 
        rnafile = rna_chr20,
        segment.columns = c(1:4)
        prune.priority = c("PAR", "PPR", "EAR", "EPR", "HET"))

RNA-seq Data File Column Selection

RNA-seq data files need to contain only Ensembl gene ID and gene expression data. User can decide to use FPKM or TPM at their discretion:

Ensembl ID | FPKM | ---------- | ---- | "V1" | "V7" | ENSG00000101138.7 | 24.69| ENSG00000101162.3 | 2.22 |

User can select columns in the RNA-seq data file using rna.column in the overlap function:

overlap(hicfile = hic_chr20, 
        segmentfile = segment_chr20, 
        rnafile = rna_chr20,
        rna.columns = c(1, 7))

biomaRt Selection

HiCAGE enables users to select various biomaRts using the biomaRt package, allowing for flexibility in species and genome build for annotating genomic regions. Default genome selection is "hsapiens_gene_ensembl". Genome selection must match the genome used to compile the 3C-based data file.

The biomaRt can be specified in the overlap function and will be passed on to the biomaRt package

overlap <- function(hicfile,
                    bio_mart = "ensembl",
                    martset = "hsapiens_gene_ensembl",
                    webhost = "")

Example of available datasets in the "ensembl" Mart:

dataset | description | version --------- | --------- | -------- hsapiens_gene_ensembl | Human genes | GRCh38.p7 mmusculus_gene_ensembl | Mouse genes | GRCm38.p5 rnorvegicus_gene_ensembl | Rat genes | Rnor_6.0 dmelanogaster_gene_ensembl| Fruitfly genes | BDGP6 scerevisiae_gene_ensembl | Saccharomyces cerevisiae genes | R64-1-1 celegans_gene_ensembl | Caenorhabditis elegans genes | WBcel235 ocuniculus_gene_ensembl | Rabbit genes | OryCun2.0 xtropicalis_gene_ensembl | Xenopus genes | JGI 4.2 drerio_gene_ensembl | Zebrafish genes | GRCz10

A full list of currently available datasets can be found using:

ensembl <- useMart(biomart = "ensembl")

Example of available datasets in the "ENSEMBL_MART_MOUSE" Mart:

dataset | description | version --------- | --------- | ------ mwsbeij_gene_ensembl | Mouse WSBEiJ genes | WSB_EiJ_v1 mc3hhej_gene_ensembl | Mouse C3HHeJ genes | C3H_HeJ_v1 mc57bl6nj_gene_ensembl | Mouse C57BL6NJ genes | C57BL_6NJ_v1 mnzohlltj_gene_ensembl | Mouse NZOHlLtJ genes | NZO_HlLtJ_v1 mpwkphj_gene_ensembl | Mouse PWKPhJ genes | PWK_PhJ_v1 mfvbnj_gene_ensembl | Mouse FVBNJ genes | FVB_NJ_v1 mcbaj_gene_ensembl | Mouse CBAJ genes | CBA_J_v1 mcasteij_gene_ensembl | Mouse CASTEiJ genes | CAST_EiJ_v1 mlpj_gene_ensembl | Mouse LPJ genes | LP_J_v1 makrj_gene_ensembl | Mouse AKRJ genes | AKR_J_v1 mbalbcj_gene_ensembl | Mouse BALBcJ genes | BALB_cJ_v1 mnodshiltj_gene_ensembl | Mouse NODShiLtJ genes | NOD_ShiLtJ_v1 m129s1svimj_gene_ensembl| Mouse 129S1SvImJ genes | 129S1_SvImJ_v1 mspreteij_gene_ensembl | Mouse SPRETEiJ genes | SPRET_EiJ_v1 mdba2j_gene_ensembl | Mouse DBA2J genes | DBA_2J_v1 maj_gene_ensembl | Mouse AJ genes | A_J_v1

A full list of currently available datasets in "ENSEMBL_MART_MOUSE" can be found using:

ensembl <- useMart("ENSEMBL_MART_MOUSE")

Available archived versions of Ensembl


Data Output

overlap Data Output

Data from the overlap function is output as a data table


gogenelist Data Output

The gogenelist function allows the user to conveniently select a chromatin mark of interest (proximalmark) interacting with another chromatin mark (distalmark) and generates an ordered list of all genes and expression data associated with the proximalmark. HGNC symbols can be included by setting gene.symbol argument to TRUE. A gene expression cutoff can be set with expression_cutoff argument to filter gene list.

gogenelist(datafile = example,
           proximalmark = "PAR",
           distalmark = "EAR",
           gene.symbol = TRUE,
           species = "human",
           bio_mart = "ensembl",
           martset = "hsapiens_gene_ensembl",
           webhost = "",
           geneOnto = FALSE,
           expression_cutoff = 1)

If geneOnto argument is set to TRUE, gene ontology analysis will be run on genes found near the proximal mark interacting with the distal mark. GO analysis is run using the TopGO package. The background gene set is all genes found in the overlap data output file.

go.analysis <- gogenelist(datafile = example,
                          proximalmark = "PAR",
                          distalmark = "EAR",
                          gene.symbol = FALSE,
                          species = "human",
                          bio_mart = "ensembl",
                          martset = "hsapiens_gene_ensembl",
                          webhost = "",
                          geneOnto = TRUE,
                          expression_cutoff = 0.1)


Circos plot of interactions

circleplot(datatable = example, display.legend = TRUE)

UpSetR-style plot of interactions

plotup(datafile = example)

Graphical User Interface (GUI)

A graphical user interface of HiCAGE can be launched locally with:


Alternatively, a HiCAGE GUI can be accessed online at without the need to install any software

mworkman13/HiCAGE documentation built on May 23, 2019, 11:58 a.m.