Graphical introduction to HISTA with a tabbed menu bar.
Default tab with interactive menu for SDA components and gene exploration.
Table summarizing SDA components and observations.
Heatmaps identifying signature patterns for positive and negative cell scores.
Gene expression boxplots across pathologies with Wilcox testing.
Gene expression boxplots across cell types.
Gene expression on t-SNE projection, subset by cell type.
Scores of a searched component, mapped on defined cell types.
Scores of a searched component, mapped on 2D projection.
Metadata mapped on 2D projection, subset by cell type.
Heatmap of gene-gene correlations within selected cells.
Heatmap of component-component correlation based on top-loaded genes.
Pseudotime trajectory visualizing cell scores of SDA components across germ cells.
Visualize gene expression across pseudotime for a searched gene.
Index of SDA components pertaining to germ cells and observations.
Hypergeometric testing to identify components enriched with a set of genes.
Statistical approach to identify key components enriched or depleted of a gene set.
Explore Differentially Expressed (DE) Gene tables derived in the original analysis.
Explore long non-coding RNAs (lncRNAs) associated with a specific component.
Analysis limited to somatic cells with a new Klinefelter scRNA-Seq dataset.
Validation of Leydig cell findings with a new scRNA-Seq dataset.
The Home page of HISTA provides a graphical introduction to navigating HISTA. On the left is a tabbed menu bar to navigate the main features of HISTA, described in detail below.
The 'Main' tab is the primary landing page when HISTA is launched. At the center-top, two information boxes present background details for the selected SDA component and gene, accessible through the interactive menu on the left panel of this tab. The menu offers various parameters to modify and customize the displayed content.
Starting from the top and progressing downwards on the page:
The following selection, through a provided radio button, allows the user to choose which pre-processed t-SNE plots to display. Options include:
Visualization Options:
The final elements in the interactive menu include buttons for downloading top-loaded gene lists for export and manually navigating the SDA components.
Figures Below Input Section:
Chromosome location highlighting the loading weight of each gene relative to its position across the chromosomes.
Bottom of the Tab:
A table of the SDA components and our observations summarized pertaining to each. To curate this table, iterative rounds of analysis were performed on each component, representing a summarized form of our SDA findings.
Two heatmaps are utilized to identify signature/barcode patterns, annotating each component quantitatively. One heatmap is dedicated to positive cell scores, and the other to negative cell scores. A Chi-squared analysis is employed to assess the number of cells (positive or negative) per component, thus identifying the enrichment of these cells per component.
When contrasting by pathology (CNT, INF1, INF2, KS, JUV), an extra dimension is added to the enrichment analysis. The pair-wise hierarchical clustering, which can be toggled using the provided radio buttons, identifies similar enrichment patterns for positively or negatively scored cells. Users can choose from available metadata options via radio buttons, such as cell types, donors, pathology, etc.
The columns of the heatmap represent the SDA components, while the rows correspond to the experimental conditions. By defining thresholds on the cell score matrix, the number of cells scored positively or negatively is enumerated and subjected to a Chi-Squared test. Visualized are the transformed residuals that highlight enrichment or depletion. Pairwise hierarchical clustering is applied, revealing components and samples that are most similar, clustering together.
An interesting result of this analysis supports previous findings that INF2, the patient with secondary azoospermia, is more similar to the adult controls than INF1, the idiopathic azoospermia patient.
Gene expression boxplots depict the expression profile of a searched gene across different pathologies (CNT, INF1, INF2, KS, JUV). These boxplots are accompanied by Wilcox testing, assessing the distributional similarity of a single gene's expression between the available experimental conditions in HISTA (e.g., CNT, KS, JUV). The p-value is computed via the Wilcox rank-sum test.
Additionally, users have the ability to subset the analysis by cell type using available radio buttons. For example, searching for the XIST gene across all cell types may reveal its significant enrichment in Klinefelter Syndrome (KS) (p < 2.22e-16 compared to controls). Further refinement is possible by selecting specific cell types, such as Sertoli cells (SC), where the previously observed significant enrichment may be lost, as reported previously1.
Gene expression boxplots illustrate the expression pattern of a searched gene across various cell types, with the option to subset by pathology (All, CNT, INF1, INF2, KS, JUV). This tab allows users to quantify the expression of a specific gene across all available cell types.
Unlike the per pathology boxplots, statistics are not provided in this case, aiming to minimize the complexity of the figure. For instance, this visualization approach enables a quick observation, such as noting that the Sertoli cell marker SOX9 is exclusively expressed in Sertoli cells. This method proves powerful in rapidly visualizing the distribution of gene expression across the available cell types.
This tab displays the gene expression of a searched gene, batch-corrected and mapped onto the t-SNE projection. Users have the flexibility to subset the analysis by cell type. Additional radio buttons offer the choice of displaying the 2D plot on various scopes of the data, such as t-SNE or UMAP.
This visualization provides insights into the spatial distribution of gene expression in a 2D projection, allowing users to observe how the expression of a specific gene varies across different cell types. The option to select different 2D plots enhances the exploration of the gene expression landscape in the context of the chosen dimensionality reduction method.
This tab visualizes scores of a searched component, mapped onto defined cell types, with the option to subset donor sets. It facilitates exploration into the scoring patterns of each component relative to the specified cell types.
Combined with the Gene Expression per Cell Type (Boxplot) feature, users can deeply investigate each cell type. The parallel visualization of cell scores and gene expression allows for a comprehensive understanding of how the searched component behaves across different cell types and donor sets.
This tab presents scores of a searched component mapped onto the 2D (t-SNE or UMAP) projection, offering the option to subset by cell type. The purpose is to provide a more in-depth visualization of cell scores projected onto the pre-computed 2D representation.
Users can subset the figure by cell type using the available radio buttons. For instance, examination of SDA component 1 reveals that the highest absolute scores are concentrated in the spermatid population of germ cells. Zooming in on these cells allows for a closer observation of the distinct banding pattern, where last and early spermatids show positive scores while the spermatids between them exhibit negative scores.
Digging into the gene loadings of this component unveils specific gene regulation patterns that explain the observed banding. For example, the top positively loaded genes include SPRR4 and PRM1, while the top negatively loaded genes consist of FSCN3 and PRM3, supporting the regulation of spermiogenesis as spermatids conclude their maturation.
This tab allows users to map available metadata (selectable) onto a 2D projection, with the option to subset by cell type. It enables users to create a parallel visualization to the focused t-SNE or UMAP of the cell score or gene expression 2D tabs. The inclusion of metadata provides a closer look at the origin and background of each cell, enhancing the understanding of the characteristics associated with different cell types.
This tab allows users to input a set of genes and select a cell type via radio buttons. The visualization consists of a heatmap displaying gene-gene correlations within the selected cells. Users can explore the relationships and patterns of expression among the specified set of genes within the chosen cell type.
The top-loaded genes play a crucial role in various assessments, influencing the translation of findings. This tab offers a way to explore and evaluate the relationship of the components, narrowed down by the number of top genes.
To begin, select a component of interest using numeric input. Then, use the slider to choose how many top genes to include in the correlation analysis. The result is visualized in a heatmap illustrating component-component correlations. For convenience, the top-loaded genes used in the analysis are displayed.
In this tab, a pseudotime trajectory is inferred on the t-SNE 2D projection of germ cells. This trajectory provides order to cells driven by the transcriptomic landscape, parallel to the known spermatogenesis trajectory.
The cell scores of SDA components (y-axis) related to germ cells are plotted across cells ordered by pseudotime (x-axis). Commonly observed "wave" patterns translate to transcriptional kinetics of that component relative to the affected cells.
Additionally, users have the ability to select available metadata, aiding in the identification of correlating scoring patterns within the pseudotime trajectory.
This tab enables users to type in a gene of interest, visualizing the expression wave across our defined pseudotime for germ cells. The plotted expression wave provides insights into how the gene's expression changes over the trajectory of pseudotime.
Furthermore, users can select metadata options to visualize any differential expression patterns associated with the specified gene across the pseudotime trajectory. This feature enhances the exploration of gene expression dynamics within the context of pseudotime in germ cells.
This tab presents an index of the SDA components related to germ cells along with a summary of observations. A key feature of this table is the order assigned to these components relative to the cells, representing stages of spermatogenesis, where they score with the most magnitude.
The ranking is determined by identifying the main peak/maxima of the density curve fitted to the scores (y-axis) by pseudotime (x-axis) scatter using a peak-finding algorithm. The position of this maxima defines the rank. For example, the third-ranked SDA component, SDA #149, splits early spermatogonial cells into three sections, where the intermediate section is scored negatively opposite to the others. Further details are provided in the vignettes section.
This tab performs hypergeometric testing (K=150) given a set of genes to identify components that are highly enriched with those genes. The analysis considers an adjusted p-value less than 0.01 for determining significant enrichments. However, the fold enrichment can also provide valuable insights even in the absence of statistical significance.
It's important to note that this statistical approach is most suitable for sets with less than ~30 genes, and it is particularly informative when dealing with more than 3 genes.
This tab employs a statistical approach to identify key components, particularly suitable for larger sets of genes. The default input is loaded with antisense genes. The tab includes several figures:
The distribution is layered over a second distribution derived from an equal-length random sample of genes (computed dynamically).
Ranking of Components:
This ranking provides insights into components that are more or less enriched than the average for the input gene set.
Correlation Heatmap:
The DEgenes tab enables users to explore the Differentially Expressed (DE) Gene tables derived from the original analysis. Key features include:
The top drop-down menu allows users to select model conditions for comparison (e.g., control vs infertiles - CNTL_vs_INF1, CNTL_vs_INF2, CNTL_vs_KS, etc.).
Unsupervised Clustering DE Selection:
The lncRNAs tab, described in this manuscript's long non-coding RNA (lncRNA) vignette, offers insights into lncRNAs. Key features include:
Users can explore lncRNAs by entering the component number of interest in the input box.
Venn Diagram:
At the top, a Venn diagram illustrates the overlap between Ensembl lncRNA annotated genes and all genes found in HISTA.
Distribution Analysis:
A comparison is made with a random equal-length set of genes.
Component Sorting:
The "Soma only W. LN19" tab provides support material for the KS manuscript 1, focusing on the new Klinefelter scRNA-Seq data from Leurentino et al. (2019). Key points include:
As with the existing KS donors, there were no germ cells in this new patient. Therefore, the analysis focuses exclusively on somatic cells.
Sertoli Cell Clusters:
The largest SC subcluster is mostly derived from LN19.
Leydig Cell Subtypes:
LN19 LC are predominantly found in PLC and MLC clusters.
Distinct Cell Cluster:
The "Zhao et al. Validation" tab provides validation support for the KS manuscript1 based on the 2021 scRNA-Seq dataset by Zhao et al.
If you have one or more genes and want to learn more about them within HISTA, follow these steps:
To examine the expression of each gene, search for the gene in the "Main" tab.
2D Expression Plot (t-SNE or UMAP) in Main Tab:
This information helps identify gene sets that correlate with the gene of interest.
Gene Expression Per Pathology Tab:
This tab allows you to specify a particular cell type or analyze expression across all cells.
Gene Expression Per Cell Type Tab:
Explore expression across cell types within specific pathologies using the 'Gene Expression Per Cell Type' tab.
Pseudotime Expr Tab:
If the gene is expressed in germ cells, check the 'Pseudotime Expr’ tab to observe expression patterns across pseudotime.
Enrichment Analysis Tab:
Search these components for annotations and additional correlating genes.
Top Loaded Components Tab:
These steps allow comprehensive exploration of gene expression patterns and relationships within HISTA.
Suppose you're interested in understanding the differential expression patterns across various pathologies and cell types in HISTA. Here's a step-by-step guide:
Explore the 2D (t-SNE or UMAP) expression plot to identify potential genes of interest.
Gene Expression Per Pathology Tab:
Utilize Wilcox statistics to visualize the differential expression of genes across different pathologies.
Gene Expression Per Cell Type Tab:
This provides a detailed view of how genes vary across different cell types within selected pathologies.
Pseudotime Expr Tab:
Observe how gene expression changes across pseudotime, gaining insights into developmental trajectories.
Enrichment Analysis Tab:
Identify components enriched with specific gene sets, helping to understand their functional relevance.
Top Loaded Components Tab:
By following these steps, you can gain a comprehensive understanding of how genes are expressed and regulated across different conditions and cell types in HISTA.
Let's say you are interested in understanding how the expression of certain genes evolves across pseudotime, representing the progression of spermatogenesis. Here's how you can explore this within HISTA: markdown Copy code * Start by searching for your genes of interest in the 'Main' tab to observe their expression patterns in the 2D (t-SNE or UMAP) projection.
Head to the 'Pseudotime Expr' tab to visualize the expression pattern of these genes along the pseudotime trajectory.
To gain a comprehensive view, consider using the 'Gene Expression Per Pathology' tab to evaluate differential expression across different pathologies.
If you have a set of genes, the 'Enrichment Analysis' tab can help identify which SDA components are enriched with these genes, providing deeper insights into their functional relevance.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.