knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

The advent of single-cell sequencing technologies in profiling 3D genome organization led to development of single-cell high-throughput chromatin conformation (scHi-C) assays. Data from these assays enhance our ability to study dynamic chromatin architecture and the impact of spatial genome interactions on cell regulation at an unprecedented resolution. At the individual cell resolution, heterogeneity driven by the stochastic nature of chromatin fiber, various nuclear processes, and unwanted variation due to sequencing depths and batch effects poses major analytical challenges for inferring single cell-level 3D genome organizations.

To explicitly capture chromatin conformation features and distinguish cells based on their 3D genome organizations, we develop a simple and fast band normalization approach, BandNorm. BandNorm first removes genomic distance bias within a cell, and sequencing depth normalizes between cells. Consequently, BandNorm adds back a common band-dependent contact decay profile for the contact matrices across cells. The former step is achieved by dividing the interaction frequencies of each band within a cell with the cell’s band mean. The latter step is implemented by multiplying each scaled band by the average band mean across cells.

Installation

To use BandNorm, the following packages are necessary:

install.packages(c('ggplot2', 'dplyr', 'data.table', 'Rtsne', 'umap'))

if (!requireNamespace("devtools", quietly=TRUE))
    install.packages("devtools")

devtools::install_github("immunogenomics/harmony")

BandNorm can be installed from Github:

devtools::install_github('sshen82/BandNorm', build_vignettes = FALSE)

Also, if the user doesn't have default R path writing permission, it is possible to install the package to the personal library:

withr::with_libpaths("personalLibraryPath", devtools::install_github("sshen82/BandNorm"))

Format of input

There are three possible ways to input the data:

  1. A path containing all cells in the form of [chr1, binA, chr2, binB, count], the cell looks like below:
chr1    0       chr1    1000000 9
chr1    1000000 chr1    1000000 200
chr1    0       chr1    2000000 2
chr1    1000000 chr1    2000000 4
chr1    2000000 chr1    2000000 220
chr1    1000000 chr1    3000000 1
chr1    2000000 chr1    3000000 11
chr1    3000000 chr1    3000000 197
chr1    1000000 chr1    4000000 1
chr1    2000000 chr1    4000000 2
  1. An R data.frame in the form of [chr1, binA, binB, count, diag, cell_name]. The column names should be c("chrom", "binA", "binB", "count", "diag", "cell"). Below we print the example data.
library(BandNorm)
data(hic_df)
print(hic_df[1:10, ])
  1. We also support using Juicer .hic format, and you can use bandnorm_juicer function to run it. We will have a thorough example for this in the future.

For the example dataset, we also have batch information and cell-type information. Note that the cell_name in those external information should be consistent with the hic_df file.

data(batch)
data(cell_type)
print(batch[1:10, ])
print(cell_type[1:10, ])

Download existing single-cell Hi-C data

You can also use download_schic function to download real data. The available data includes Kim2020, Li2019, Ramani2017, and Lee2019. You can specify one of the cell-types in those cell lines. There are also summary files available for them, and the summary includes batch, cell-type, depth and sparsity information. You can go to this website for more information. We won't run it here since it will mess up your computer.

# download_schic("Li2019", cell_type = "2i", cell_path = getwd(), summary_path = getwd())

Use BandNorm

We provide a demo scHi-C data named hic_df, sampled 400 cells of Astro, ODC, MG, Sst, four cell types and chromosome 1 from Ecker2019 for test run. After obtaining the hic_df file, you can use the bandnorm function to normalize the data. The result consists of 6 columns: chromosome, binA, binB, diag (which is binB - binA), cell (which is the name of the cell), and BandNorm. You can use the save option to save the normalized file, and if so, you need to give a path for output.

bandnorm_result = bandnorm(hic_df = hic_df, save = FALSE)
bandnorm_result[1:10, ]
# bandnorm_result = bandnorm(hic_df = hic_df, save = TRUE, save_path = getwd())

Then, you can use create_embedding function to obtain a PCA embedding for the data. You can choose to use Harmony to remove the batch effect.

embedding = create_embedding(hic_df = bandnorm_result, do_harmony = TRUE, batch = batch, chrs = "chr1")
embedding[1:10, ]

Finally, using the embedding, you can use UMAP or tSNE to get a projection of all the cells on the 2D plane. If you have the cell-type information or batch information, it is possible to color them on the plot.

plot_embedding(embedding, "UMAP", cell_info = cell_type, label = "Cell Type")
plot_embedding(embedding, "UMAP", cell_info = batch, label = "Batch")


sshen82/BandNorm documentation built on June 1, 2025, 6:25 p.m.