knitr::opts_chunk$set( message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>", fig.align = "center" )
clustifyr provides several functions to plot tSNE or UMAP results. The plot_dims() function will plot tSNE or UMAP data using a meta.data table and can color the cells based on cluster identity.
library(clustifyr) library(clustifyrdata) library(dplyr) library(tibble) # Matrix of normalized single-cell RNA-seq counts pbmc_matrix[1:3, 1:3] # meta.data table containing cluster assignments for each cell pbmc_meta[1:5, ] # Create tSNE and color cells based on cluster ID plot_dims( x = "UMAP_1", # name of column in the meta.data containing the data to plot on x-axis y = "UMAP_2", # name of column in the meta.data containing the data to plot on y-axis data = pbmc_meta, # meta.data table containing cluster assignments for each cell feature = "seurat_clusters" # name of column in meta.data to color cells by )
Cells can also be colored based on the expression level of a gene or list of genes using the plot_gene() function.
# Create tSNE and color cells based on gene expression plot_gene( x = "UMAP_1", # name of column in the meta.data containing the data to plot on x-axis y = "UMAP_2", # name of column in the meta.data containing the data to plot on y-axis expr_mat = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta %>% rownames_to_column("rn"), # meta.data table containing cluster assignments for each cell genes = c("CD79B", "CD8A"), # vector of gene names to color cells cell_col = "rn" # name of column in meta.data containing the cell IDs )
clustifyr() resultsThe clustifyr() function outputs a matrix of correlation coefficients and clustify_lists() and clustify_nudge() output positive scores. clustifyr provides built-in functions to help visualize these results.
Cell type assignments can be assessed by plotting the clustifyr() correlation matrix as a heatmap using the plot_cor_heatmap() function.
# Run clustifyr() res <- clustify( input = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta, # meta.data table containing cluster assignments for each cell ref_mat = cbmc_ref, # reference matrix containing bulk RNA-seq data for each cell type query_genes = pbmc_vargenes, # list of highly varible genes identified with Seurat cluster_col = "seurat_clusters" # name of column in meta.data containing cell clusters ) # Create heatmap using the clustifyr correlation matrix plot_cor_heatmap( cor_mat = res # matrix of correlation coefficients from clustifyr() )
The plot_cor() function can also be used to create a tSNE for each cell type of interest and color the cells based on the correlation coefficients.
# Create a tSNE for each cell type of interest and color cells based on correlation coefficients plot_cor( x = "UMAP_1", # name of column in the meta.data containing the data to plot on x-axis y = "UMAP_2", # name of column in the meta.data containing the data to plot on y-axis cor_mat = res, # matrix of correlation coefficients from clustifyr() metadata = pbmc_meta, # meta.data table containing cluster assignments for each cell data_to_plot = colnames(res)[1:2], # name of cell type(s) to plot correlation coefficients cluster_col = "seurat_clusters" # name of column in meta.data containing cell clusters )
Cell clusters can also be labeled using the plot_best_call() function, which takes the correlation matrix and labels cell clusters with the cell type that has the highest correlation coefficient.
# Create tSNE and label clusters with the cell type that has the highest correlation coefficient plot_best_call( cor_mat = res, # matrix of correlation coefficients from clustifyr() metadata = pbmc_meta, # meta.data table containing UMAP or tSNE data do_label = TRUE, # should the feature label be shown on each cluster? do_legend = FALSE, # should the legend be shown? cluster_col = "seurat_clusters" )
clustifyr() accuracyThe clustifyr() results can also be evaluated by over-clustering the data and comparing the cell type assignments before and after over-clustering. This is accomplished using the overcluster_test() function. The cell type assignments should be similar with and without over-clustering.
# Overcluster cells and compare cell type assignments with and without over-clustering overcluster_test( expr = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta, # meta.data table containing UMAP or tSNE data ref_mat = cbmc_ref, # reference matrix containing bulk RNA-seq data for each cell type cluster_col = "seurat_clusters", # name of column in meta.data containing cell clusters n = 5 # expand cluster number n-fold for overclustering )
The cell types from the bulk RNA-seq reference matrix can also be mixed together using the make_comb_ref() function to assess the specificity of the cell type assignments. If a cluster shows a higher correlation when using the mixed reference matrix, this suggests that the cluster contains multiple cell types.
# Create reference containing different combinations of the bulk RNA-seq data comb_ref <- make_comb_ref( ref_mat = cbmc_ref # reference matrix containing bulk RNA-seq data for each cell type ) # Peek at the new combined reference comb_ref[1:5, 1:5] # Run clustifyr() using the combined reference comb_res <- clustify( input = pbmc_matrix, # matrix of normalized single-cell RNA-seq counts metadata = pbmc_meta, # meta.data table containing cluster assignments for each cell ref_mat = comb_ref, # reference matrix containing bulk RNA-seq data for each cell type query_genes = pbmc_vargenes, # list of highly varible genes identified with Seurat cluster_col = "seurat_clusters" # name of column in meta.data containing cell clusters ) # Create tSNE and label clusters with the assigned cell types from the combined reference plot_best_call( cor_mat = comb_res, # matrix of correlation coefficients from clustifyr() metadata = pbmc_meta, # meta.data table containing UMAP or tSNE data do_label = TRUE, # should the feature label be shown on each cluster? do_legend = FALSE, # should the legend be shown? cluster_col = "seurat_clusters" )
Visualization of other attributes shared in the metadata between ref and query by plot_cols, such as nGene, nUMI, mt_percentage, as another way of identity confirmation after clustify. Certain cell types have distinct patterns, more genes detected, for example.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.