README.md

GAC: Genetic Analysis of Cells

Lifecycle:
maturing R build
status Codecov test
coverage Travis build
status CRAN
status

GAC is currently in ALPHA-release

The goal of GAC is to deliver a formal end-to-end analysis by integrating proven methods of quantitative genetics, statistics, and evolutionary biology for the genetic analysis of single-cell DNA copy number. GAC implements a simple, lightweight, and open-source R framework (Figure 1). Inspired, but unlike Seurat and Scanpy, adapts the logic of espressioSet/AnnData into relational matrices in native R. This keeps the toolkit easy to learn, hard to master; and facilitates the integration of algorithms for the downstream analysis of single-cell DNA data wich is so desperately needed. For now GAC facilitates the downstream analyses of segmented data with common segments by concurrently managing the X, Y across all cells or samples e.g. the output of Varbin/Ginkgo, FACETS, MUMdex, HMMcopy, or SCOPE. The unsegmented bin read counts is not a correct input. GAC uses ComplexHeatmap, an ultra-powerful tool for heatmaps to help visualize the data.

To implement GAC we require five easy to generate inputs: - a copy number / genotype matrix (X) (bins[i] x cells[j]) - a phenotype matrix (Y) (cells[j] x phenotype[y]) - a qc matrix (technical wet-lab notes) (qc) (cells[j] x qc[c]) - a gene to bin index (gene.index) - the genomic coordinates of the bins or genotypes (chromInfo) - and an optional expression matrix (Ye; for DNA-RNA or same-cell G+T(Macaulay, et al.2015)

Figure 1.

Installation

Dependencies:

Install dependencies

install.packages("devtools")
devtools::install_github("KrasnitzLab/SCclust")

install.packages("BiocManager")
BiocManager::install(c("ComplexHeatmap", "ConsensusClusterPlus"))

You can install the development version from GitHub with:

devtools::install_github("SingerLab/gac")

Examples

This is a basic example for drawing a copy number heatmap. For a comprehensive overview of the package please follow the getting_started.Rmd in the vignettes/

library(gac)

## basic example code
data(cnr)
data(segCol)
data(legSeg)

( excl.cells <- rownames(cnr$qc)[cnr$qc$qc.status == "FAIL"] )
#> [1] "cell5"  "cell11"

cnr <- excludeCells(cnr, excl = excl.cells)

aH <- HeatmapCNR(cnr, what = 'X', col = segCol, show_heatmap_legend = FALSE)

draw(aH, annotation_legend_list = list(legSeg))

bH <- HeatmapCNR(cnr, what = "genes",
                 which.genes = c("CDK4", "MDM2"),
                 col = segCol, show_heatmap_legend = FALSE)
#> Warning: The input is a data frame, convert it to the matrix.

draw(bH, annotation_legend_list = list(legSeg))

Motivation and design

The Singer Lab single-cell wet-lab and dry-lab endevours are carried forward by a skeleton crew. The need to have something simple that can help reduce the 85% of the time spent syncronizing bins, to genes, to phenotypes, and QC matrices capable of handling a large data set of >24,000 cells was greatly needed. Knowing the data is growing by the week, I integrated functions to deal with the n+1 problem. Lastly, my background in animal genomics allowed me to borrow the succesful frameworks used in Genomic Selection in an abstract way in hopes that we can provide appropriate models for future same-cell technologies.

We hope you enjoy !

What’s in the works

Licence

GAC framework and code is distributed under a BSD-3 License



SingerLab/gac documentation built on July 22, 2021, 3:27 a.m.