GenotypeMixtures is a handy package that builds on the souporcell package (Heaton et al. 2020), to stitch together genotypes across multiple single cell genomics experiments with an overlapping mixture experimental design...

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Install GenotypeMixtures from github. Requires devtools.

#devtools::install_github("bjstewart1/GenotypeMixtures")

Load GenotypeMixtures

library(GenotypeMixtures)

Experimental designs can be read in using this function if you point at a .csv Alternatively you can read in the .csv however you like, or construct from another file The experiments (10X channels (mixtures) should be rows, and the donors/genotypes should be columns. Membership is denoted by 1 vs 0.

exp_design_path = system.file("extdata", "experimental_design.csv", package = "GenotypeMixtures")
experimental_design <- read_experimental_design(experimental_design_path = exp_design_path)
plot_experimental_design(experimental_design)

We can also read in the locations of the souporcell directories The first column should be the mixture name, the second column should be the path to the soup or cell directory There is some built in dummy vcf files in the package for this vignette

file_locations <- data.frame("channel" = rownames(experimental_design), 
                             "SOC_directory" =  file.path(system.file("extdata", package = "GenotypeMixtures"), rownames(experimental_design) ))
head(file_locations)

Now we plug this into the main function which constructs a genotype cluster graph

genotype_clustering_output <- construct_genotype_cluster_graph(experimental_design = experimental_design, file_locations = file_locations )

Now we can plot the graph which stitches together the genotypes

genotype_clustering_output$graph_plot

Now we can plot the membership matrix which tells us which of our genotypes belongs to which mixtures

genotype_clustering_output$membership_plot

Now we can plot genotype VAFs - this is a useful diagnostic plot; matching genotypes should have their variants along the diagnonal. This is synthetic data, but real data should look reasonably similar to this

plot_cross_vaf(experiment_1_path = file_locations[2, 2], 
               experiment_2_path = file_locations[3,2], 
               experiment_1_name = file_locations[2,1],
               experiment_2_name = file_locations[3,1])

We can map these computed genotypes back to the original genotypes in our experimental design

cluster_mapping <- membership_map(experimental_design = experimental_design,
                                  graph_output =  genotype_clustering_output)
tail(cluster_mapping)

Finally we can assign single cells across our experiments to genotype - feed the output of membership_map() to cells_to_genotypes The output of this can be easily added to the metadata of your single cell experiment/seurat/anndata object

cell_assignments <- cells_to_genotypes(SOC_locations = file_locations, 
                                       membership_mat =cluster_mapping)
tail(cell_assignments)

The package can also output an experimental design with varying levels of density

dense_design <- make_overlapping_mixture(n_mixtures = 12, n_genotypes = 7, density = 1 )
medium_density_design <- make_overlapping_mixture(n_mixtures = 12, n_genotypes = 7, density = 0.5 )
sparse_design <- make_overlapping_mixture(n_mixtures = 12, n_genotypes = 7, density = 0 )

This is a dense design

plot_experimental_design(dense_design)

This is a medium design

plot_experimental_design(medium_density_design)

This is a sparse design

plot_experimental_design(sparse_design)


bjstewart1/GenotypeMixtures documentation built on Aug. 27, 2022, 2:01 p.m.