Several comprehensive RNA-seq experiments in different brain cell types have now been published in humans and mice. Some of these experiments have profiled gene expression of cell populations isolated through immunopanning procedures. Immunopanning involves immunoprecipitation of particular cell types in cell culture plates, based on selection for an antibody adsorbed to the plate surface. Others studies have performed RNA profiling of single cells with microfluidics devices and used clustering methods to identify cell types from the resulting RNA expression profiles. The devices used for single cell RNA sequencing (scRNA-seq) often select cells based on size or via encapsulation in a droplet and involve the creation of a cDNA library from the transcriptome from a theoretical maximum of one cell.
Existing studies have been mainly based on individual datasets, and are therefore subject to systematic noise, including sampling bias due to sample collection or preparation technique, as well as stochasticity in gene expression. As an increasing number of RNA-seq cell type-specific transcriptomic experiments have become available for both human and mouse, we set out to conduct a comprehensive meta-analysis of brain cell type gene signatures, which is now published in McKenzie et al (2018),
The five data sets used in the creation of the cell type marker signatures can be found in the manuscript.
A major goal of BRETIGEA (BRain cEll Type specIfic Gene Expression Analysis) is to simplify the process of defining your own set of brain cell type marker genes by using a well-validated set of cell type-specific marker genes derived from both immunopanning and single cell microfluidic experiments, as described in McKenzie et al (2018),
First, we will load the package and read in example bulk RNA-sequencing data from four brain regions (frontal white matter, temporal cortex, parietal cortex, and hippocampus), which was generated by the Allen Brain Atlas [@memoryagingtbi2016] and filtered to contain primarily brain marker genes. We also will load a data frame with additional immunohistochemistry quantification measurements from each brain sample, to use as a validation of the method.
library(BRETIGEA, quietly = TRUE) library(knitr) #only used for vignette creation
Here is the format of the inputs:
str(aba_marker_expression, list.len = 5) str(aba_pheno_data, list.len = 5)
To run the brain cell type proportion estimation analysis and extract the matrix of surrogate proportion variables for each of the major six brain cell types (astrocytes, endothelial cells, microglia, neurons, oligodendrocytes, and OPCs), run this:
ct_res = brainCells(aba_marker_expression, nMarker = 50) kable(head(ct_res))
Note that the above analysis uses nMarker = 50 marker genes. A notable trade-off in the selection of the number of marker genes to include in the analysis is that the more marker genes you use, the more likely you are to average out any cell type-specific expression changes that may occur across groups in your sample. On the other hand, the fewer marker genes you use, the higher-quality these marker genes will tend to be in terms of strength of cell type specificity. We have chose nMarker = 50 because it has been a reasonable number in our experince, but the goals of your analysis may differ and you may want to choose a different number of marker genes for each cell type.
Note that only marker genes which have been measured in your data set will be used by the cell type proportion estimates, so if your data set has fewer gene measurements (e.g., in a proteomics data set), that may be a reason to use fewer marker genes.
Comparing these cell type proportion estimates to the independent immunohistochemistry quantifications of two marker genes (IBA1 and GFAP), you can see that the correlation is strong.
cor_mic = cor.test(ct_res[, "mic"], as.numeric(aba_pheno_data$ihc_iba1_ffpe), method = "spearman") print(cor_mic) cor_ast = cor.test(ct_res[, "ast"], as.numeric(aba_pheno_data$ihc_gfap_ffpe), method = "spearman") print(cor_ast)
The default cell type proportion estimation method is singular value decomposition, but if you want to use PCA, that is an option as well.
ct_res = brainCells(aba_marker_expression, nMarker = 50, species = "combined", method = "PCA") kable(head(ct_res))
The species argument controls which species the marker genes are derived from, and can be set to "human" and "mouse" for data specific to those species.
If you want to only estimate the proportion of particular cell types, you can do so by setting the celltypes argument. Here, we only estimate the proportions of astrocytes, neurons, and oligodendrocytes. Note that the estimates of each cell type is done independently, so choosing to estimate the proportions of one cell type or not will not affect the estimates of the other cell types.
ct_res = brainCells(aba_marker_expression, nMarker = 50, species = "combined", celltypes = c("ast", "neu", "oli")) kable(head(ct_res))
In addition to the default data set built from a meta-analysis across cell type-specific gene expression data, BRETIGEA also offers access to cell type markers based on leveraging variation across intact tissue samples. The cell types for which markers are available based on this data set are astrocytes, neurons, microglia, and oligodendrocytes.
To use this, change the data_set parameter to "kelley" (referring to Kelley et al., 2018, PMID: 30154505) when you call brainCells(). Note that the species argument will be ignored if data_set is set to "kelley".
ct_res = brainCells(aba_marker_expression, nMarker = 50, data_set = "kelley") kable(head(ct_res))
In the Allen Brain Atlas RNA-seq data, the estimated proportions are overall very similar between the "mckenzie" and "kelley" data sets.
ct_res_mckenzie = brainCells(aba_marker_expression, nMarker = 50, data_set = "mckenzie", species = "human") ct_res_kelley = brainCells(aba_marker_expression, nMarker = 50, data_set = "kelley") cell_types_compare = colnames(ct_res_kelley) for(i in 1:length(cell_types_compare)){ cor_res = cor.test(ct_res_mckenzie[ , cell_types_compare[i]], ct_res_kelley[ , cell_types_compare[i]], method = "spearman") df_compare_cor = data.frame(Cell = cell_types_compare[i], Rho = cor_res$estimate, PVal = cor_res$p.value) if(i ==1) df_compare_cor_tot = df_compare_cor if(i > 1) df_compare_cor_tot = rbind(df_compare_cor_tot, df_compare_cor)} kable(df_compare_cor_tot)
This alternative data set also offers marker genes derived from several specific brain regions:
print(unique(unlist(lapply(strsplit(unique(kelley_df_brain$cell)[-c(1, 2, 3, 4)], "_"), "[[", 1))))
If you have access to your own marker genes, you can use the findCells function instead. This has the same functionality otherwise; brainCells is simply a wrapper function for users who want to use the brain cell type marker genes that are provided by BRETIGEA. Note the format of the markers data frame: you must have one column with the gene symbol, named markers, and one column with the corresponding cell type, named cell.
str(markers_df_brain) ct_res = findCells(aba_marker_expression, markers = markers_df_brain, nMarker = 50) kable(head(ct_res))
BRETIGEA also offers users the ability to adjust their original gene expression matrices for the estimated cell type proportion estimates, in order to deconvolute the signal.
brain_cells_adjusted = adjustBrainCells(aba_marker_expression, nMarker = 50, species = "combined") expression_data_adj = brain_cells_adjusted$expression
Note that adjustBrainCells is a wrapper function to adjustCells and if you have your own markers (e.g., for a non-brain data set), then you can use that interface instead for deconvolution of more general cell types.
As you can see, following adjustment, there is no longer a correlation between the RNA expression of the microglia marker gene AIF1 and its encoded protein IHC quantification (IBA1), nor between the RNA and protein expression of the astrocyte marker gene GFAP. (Note there is a non-significant trend towards a residual correlation here, which may be because GFAP is not a perfect marker of astrocyte proportion in this data set, but instead varies across samples based on disease state, region, and other factors).
cor.test(as.numeric(aba_marker_expression["AIF1", ]), as.numeric(aba_pheno_data$ihc_iba1_ffpe), method = "spearman") cor.test(expression_data_adj["AIF1", ], as.numeric(aba_pheno_data$ihc_iba1_ffpe), method = "spearman") cor.test(as.numeric(aba_marker_expression["GFAP", ]), as.numeric(aba_pheno_data$ihc_gfap_ffpe), method = "spearman") cor.test(expression_data_adj["GFAP", ], as.numeric(aba_pheno_data$ihc_gfap_ffpe), method = "spearman")
If you have any problems with or questions about using this package, please open an issue on Github or contact the package maintainer.
References
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.