knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
R/GAC delivers am end-to-end analysis of single-cell DNA copy number by integrating quantitative genetics, statistics, and evolutionary biology. GAC implements a simple, lightweight, and open-source R framework (Figure 1). Inspired, but unlike Seurat and Scanpy, we integrated the logic of espressioSet/AnnData into relational matrices in native R. This keeps the toolkit light, yet facilitates the integration of algorithms for the downstream analysis of single-cell DNA data wich is so desperately needed. For now GAC facilitates the downstream analyses of segmented data with universal breakpoints by concurrently managing an integer copy number (X), phenotype (Y), and metadata (qc) matrices across all cells or samples. Copy number matrices can be genrated using the output of Varbin/Ginkgo, FACETS, MUMdex, HMMcopy, or SCOPE. The unsegmented bin read counts is not a correct input. GAC incorporates ape for phylogenetic analysis, and ComplexHeatmap, an ultra-powerful tool for heatmaps to help visualize the data.
To implement GAC we require five easy to generate inputs: - a copy number / genotype matrix (X) (bins[i] x cells[j]) - a phenotype matrix (Y) (cells[j] x phenotype[y]) - a qc matrix (technical wet-lab notes) (qc) (cells[j] x qc[c]) - a gene to bin index (gene.index) - the genomic coordinates of the bins or genotypes (chromInfo) - and an optional expression matrix (currently in development) (Ye; for DNA-RNA or same-cell G+T(Macaulay, et al.2015)
## You can install the released version of GAC from [CRAN](https://CRAN.R-project.org) with: # install.packages("gac")
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("SingerLab/gac", force = TRUE)
This is a basic example for drawing a copy number heatmap. For a comprehensive overview of the package please follow the getting_started.Rmd
in the vignettes/
library(gac) ## basic example code data(cnr) data(segCol) data(legSeg) ( excl.cells <- cnr$qc$cellID[cnr$qc$qc.status == "FAIL"] ) cnr <- excludeCells(cnr, excl = excl.cells) aH <- HeatmapCNR(cnr, col = segCol, show_heatmap_legend = FALSE) draw(aH, annotation_legend_list = list(legSeg)) bH <- HeatmapCNR(cnr, what = "genes", which.genes = c("CDK4", "MDM2", "DSP", "SMOC1"), col = segCol, show_heatmap_legend = FALSE) draw(bH, annotation_legend_list = list(legSeg))
bin
and the .X should be a matrix of common
bins
for all cells. However, to make biological sense of the data, gene
level resolution is required. Thus, building a syncronized matrix with genes
is of outmost importance. At the 11th hour, having a gene to bin index
(gene.index) allowed the flexibility to interpolate the bin data to gene level
resolution and integration to the complete set of phenotypes, and QC data, but
it's not the restricted to the mouse mouse or human genomes.The Singer Lab single-cell wet-lab and dry-lab endevours are carried forward by a skeleton crew. The need to have simple tools to help reduce the 85% of the time spent syncronizing bins, to genes, to phenotypes, and QC matrices capable of handling a large data set of >24,000 cells was greatly needed. Knowing the data is growing by the week, I integrated functions to deal with the n+1 problem.
We hope you enjoy !
Integration with MLR for non-linear genetic models
Integration with CORE and GISTIC2 for fidning focal and recurrent events
support for .seg files
Cleaner code with tidyverse
CRAN testing
GAC framework and code is distributed under a BSD-3 License
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.