knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
One application of TAP-seq is to measure the expression of genes that enable assigning cells to different cell types. The TAPseq package offers a simple approach to identify a desired number of cell type markers based on differentially expressed genes between cell types.
This optional functionality requires additional R-packages such as Seurat. Please install the TAPseq package with all suggested dependencies.
The TAPseq package contains a small subset of the mouse bone marrow data from Baccin et al., 2019 (https://www.nature.com/articles/s41556-019-0439-6). The full dataset can be downloaded here. This dataset is stored in a Seurat object which contains both gene expression and cell type data for every cell.
library(TAPseq) library(Seurat) # example of mouse bone marrow 10x gene expression data data("bone_marrow_genex") # transcript counts GetAssayData(bone_marrow_genex)[1:10, 1:10] # number of cells per cell type table(Idents(bone_marrow_genex))
A linear model with lasso regularization is used to select target genes that best identify cell
types. By default 10-fold cross-validation is used to select the number of target genes based on the
largest value of lambda within 1 standard error of the minimum. See
?glmnet::cv.glmnet for more
# automatically select a number of target genes using 10-fold cross-validation target_genes_cv <- selectTargetGenes(bone_marrow_genex) head(target_genes_cv) length(target_genes_cv)
This example results in a reasonable target gene panel, but in cases with many different cell types the resulting panels might be very large.
To avoid this, we can specify a desired number of targets. This selects a value for lamda in the lasso model that results in approximately this number of non-zero coefficients, i.e. marker genes.
# identify approximately 100 target genes that can be used to identify cell populations target_genes_100 <- selectTargetGenes(bone_marrow_genex, targets = 100) length(target_genes_100)
To intuitively assess how well a chosen set of target genes distinguishes cell types, we can use UMAP plots based on the full gene expression data and on target genes only.
plotTargetGenes(bone_marrow_genex, target_genes = target_genes_100)
We can see that the expression of the
r length(target_genes_100) selected target genes groups
cells of different populations together.
A good follow up would be to cluster the cells based on only the target genes following the same workflow used to define the cell identities in the original object. This could then be used to verify that the selected target genes reliably identify the correct cell types.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.