In huayc09/SeuratExtend: SeuratExtend

Facilitate Gene Naming Conversions
Compute Statistics Grouped by Clusters
Assess Proportion of Positive Cells in Clusters
Run Standard Seurat Pipeline

Facilitate Gene Naming Conversions {#facilitate-gene-naming-conversions}

Introduction

Gene naming conventions can vary significantly between organisms and databases, presenting a common challenge in scRNA-seq data analysis. SeuratExtend includes several functions to facilitate the conversion between human and mouse gene symbols and Ensembl IDs, as well as conversions between human and mouse homologous gene symbols. These functions leverage the biomaRt database for conversions but improve on reliability and performance by localizing the most commonly used databases, thus eliminating the need for internet connectivity and addressing the frequent instability issues with biomaRt.

Functions for Gene Naming Conversions

The functions provided for these conversions are:

HumanToMouseGenesymbol
MouseToHumanGenesymbol
EnsemblToGenesymbol
GenesymbolToEnsembl

These functions share a similar usage pattern, as detailed below using HumanToMouseGenesymbol as an example.

Getting Started with Examples

First, let's retrieve a few human gene symbols from a dataset as an example:

library(Seurat)
library(SeuratExtend)

human_genes <- VariableFeatures(pbmc)[1:6]
print(human_genes)

Default Usage

By default, HumanToMouseGenesymbol returns a data frame showing how human gene symbols (HGNC) match with mouse gene symbols (MGI):

HumanToMouseGenesymbol(human_genes)

This table indicates that not all human genes have direct mouse homologs, and some human genes may correspond to multiple mouse genes.

Simplified Output

If you prefer a simpler vector output without the matching details:

HumanToMouseGenesymbol(human_genes, match = FALSE)

One-to-One Correspondence

For cases where you require a one-to-one correspondence:

HumanToMouseGenesymbol(human_genes, keep.seq = TRUE)

Converting Gene Expression Matrices

These functions can also directly convert human gene expression matrices to their mouse counterparts:

# Create an example gene expression matrix
human_matr <- GetAssayData(pbmc)[human_genes, 1:4]
print(human_matr)

# Convert to a mouse gene expression matrix
HumanToMouseGenesymbol(human_matr)

Other Gene Naming Conversion Functions

The usage patterns for the other conversion functions in SeuratExtend, such as MouseToHumanGenesymbol, GenesymbolToEnsembl, and EnsemblToGenesymbol, are similar to those already discussed. These functions also leverage local databases to enhance performance and reliability but provide options to use online databases via biomaRt if necessary.

Here are some examples demonstrating the use of other gene naming conversion functions:

# Converting mouse gene symbols to human
mouse_genes <- c("Cd14", "Cd3d", "Cd79a")
MouseToHumanGenesymbol(mouse_genes, match = FALSE)

# Converting human gene symbols to Ensembl IDs
human_genes <- c("PPBP", "LYZ", "S100A9", "IGLL5", "GNLY", "FTL")
GenesymbolToEnsembl(human_genes, spe = "human", keep.seq = TRUE)

# Converting mouse gene symbols to Ensembl IDs
GenesymbolToEnsembl(mouse_genes, spe = "mouse", keep.seq = TRUE)

# Converting Ensembl IDs to human gene symbols
EnsemblToGenesymbol(c("ENSG00000163736", "ENSG00000090382"), spe = "human", keep.seq = TRUE)

# Converting Ensembl IDs to mouse gene symbols
EnsemblToGenesymbol(c("ENSMUSG00000051439", "ENSMUSG00000032094"), spe = "mouse", keep.seq = TRUE)

Using Online Resources

While SeuratExtend typically uses localized databases for conversions, you have the option to directly fetch results from biomaRt databases if required. This can be useful when working with less common genes or newer annotations not yet available in the local database:

# Fetching Ensembl IDs for human genes directly from biomaRt
GenesymbolToEnsembl(human_genes, spe = "human", local.mode = FALSE, keep.seq = TRUE)

Converting UniProt IDs to Gene Symbols

In addition to facilitating gene symbol and Ensembl ID conversions between human and mouse, SeuratExtend also includes functionality to convert UniProt IDs, which are widely used in proteomic databases, to gene symbols. This can be particularly useful when integrating proteomic and genomic data or when working with databases that use UniProt identifiers.

The function UniprotToGenesymbol in SeuratExtend provides a straightforward way to translate UniProt IDs into gene symbols. This function supports both human and mouse species, accommodating research that spans multiple types of biological data. Here's how you can convert UniProt IDs to gene symbols for both human and mouse:

# Converting UniProt IDs to human gene symbols
UniprotToGenesymbol(c("Q8NF67", "Q9NPB9"), spe = "human")

# Converting UniProt IDs to mouse gene symbols
UniprotToGenesymbol(c("Q9R1C8", "Q9QY84"), spe = "mouse")

Compute Statistics Grouped by Clusters {#compute-statistics-grouped-by-clusters}

The CalcStats function from the SeuratExtend package provides a comprehensive approach to compute various statistics, such as mean, median, z-scores, or LogFC, for genomic data. This function can handle data stored in Seurat objects or standard matrices, allowing for versatile analyses tailored to single-cell datasets.

Whether you're analyzing genes or pathways, CalcStats simplifies the task by computing statistics for selected features across different cell groups or clusters.

Using a Seurat Object

Begin by selecting a subset of features, such as genes. For this example, let's pick the first 20 variable features from a Seurat object:

library(Seurat)
library(SeuratExtend)

genes <- VariableFeatures(pbmc)[1:20]

Using CalcStats, compute your desired metric, like z-scores, for each feature across different cell clusters:

genes.zscore <- CalcStats(pbmc, features = genes, method = "zscore", group.by = "cluster")
head(genes.zscore)

Display the computed statistics using a heatmap:

Heatmap(genes.zscore, lab_fill = "zscore")

Select more genes and retain the top 4 genes of each cluster, sorted by p-value. This can be a convenient method to display the top marker genes of each cluster:

genes <- VariableFeatures(pbmc)
genes.zscore <- CalcStats(
  pbmc, features = genes, method = "zscore", group.by = "cluster", 
  order = "p", n = 4)
Heatmap(genes.zscore, lab_fill = "zscore")

Using Matrices as Input

For instance, you might perform Enrichment Analysis (GSEA) using the Hallmark 50 geneset and obtain the AUCell matrix (rows represent pathways, columns represent cells):

pbmc <- GeneSetAnalysis(pbmc, genesets = hall50$human)
matr <- pbmc@misc$AUCell$genesets

Using the matrix, compute the z-scores for the genesets across various cell clusters:

gsea.zscore <- CalcStats(matr, f = pbmc$cluster, method = "zscore")

Present the z-scores using a heatmap:

Heatmap(gsea.zscore, lab_fill = "zscore")

Assess Proportion of Positive Cells in Clusters {#assess-proportion-of-positive-cells-in-clusters}

This section describes how to utilize the feature_percent function in the SeuratExtend package to determine the proportion of positive cells within specified clusters or groups based on defined criteria. This function is particularly useful for identifying the expression levels of genes or other features within subpopulations of cells in scRNA-seq datasets.

Basic Usage

To calculate the proportion of positive cells for the top 5 variable features in a Seurat object:

library(SeuratExtend)

genes <- VariableFeatures(pbmc)[1:5]

# Default usage
proportions <- feature_percent(pbmc, feature = genes)
print(proportions)

This will return a matrix where rows are features and columns are clusters, showing the proportion of cells in each cluster where the feature's expression is above the default threshold (0).

Adjusting the Expression Threshold

To count a cell as positive only if its expression is above a value of 2:

proportions_above_2 <- feature_percent(pbmc, feature = genes, above = 2)
print(proportions_above_2)

Targeting Specific Clusters

To calculate proportions for only a subset of clusters:

proportions_subset <- feature_percent(pbmc, feature = genes, ident = c("B cell", "CD8 T cell"))
print(proportions_subset)

Grouping by Different Metadata

If you wish to group cells by a different variable other than the default cluster identities:

proportions_by_ident <- feature_percent(pbmc, feature = genes, group.by = "orig.ident")
print(proportions_by_ident)

Proportions Relative to Total Cell Numbers

To also check the proportion of expressed cells in total across selected clusters:

proportions_total <- feature_percent(pbmc, feature = genes, total = TRUE)
print(proportions_total)

Logical Output for Expression

For scenarios where you need a logical output indicating whether a significant proportion of cells are expressing the feature above a certain level (e.g., 20%):

expressed_logical <- feature_percent(pbmc, feature = genes, if.expressed = TRUE, min.pct = 0.2)
print(expressed_logical)

Run Standard Seurat Pipeline {#run-standard-seurat-pipeline}

The RunBasicSeurat function in the SeuratExtend package automates the execution of a standard Seurat pipeline for single-cell RNA sequencing data analysis. This comprehensive function includes steps such as normalization, PCA, clustering, and optionally integrates batch effects using Harmony. This automation is designed to streamline the analysis process, making it more efficient and reproducible.

Overview of the Pipeline

The Seurat pipeline typically includes the following steps, which are all encapsulated within the RunBasicSeurat function:

Calculating Percent Mitochondrial Content: Identifying and filtering cells based on the proportion of mitochondrial genes, which is a common quality control metric.
Normalization: Scaling data to account for cell-specific differences in library size.
PCA: Performing principal component analysis to reduce dimensionality and highlight the major sources of variation.
Clustering: Grouping cells based on their gene expression profiles to identify distinct cell types or states.
UMAP Visualization: Projecting the high-dimensional data into two dimensions for visualization.
Batch Integration (Optional): Using Harmony to correct for batch effects, ensuring that variations driven by experimental conditions are minimized.

For a comprehensive tutorial on the standard Seurat workflow, refer to the official Seurat PBMC tutorial.

Using `RunBasicSeurat`

Below are examples demonstrating how to use the RunBasicSeurat function to process scRNA-seq data:

library(SeuratExtend)

# Run the full pipeline with forced normalization and default parameters
pbmc <- RunBasicSeurat(pbmc, force.Normalize = TRUE)

# Visualize the clusters using DimPlot
DimPlot2(pbmc, group.by = "cluster")

Parameters and Customization

The function allows for extensive customization of each step through various parameters:

spe: Specifies the species (human or mouse) for mitochondrial calculations.
nFeature_RNA.min and nFeature_RNA.max: Define the range of RNA features considered for each cell.
percent.mt.max: Sets the maximum allowed mitochondrial gene expression percentage.
dims: Determines the number of dimensions used in PCA and neighbor finding.
resolution: Adjusts the granularity of the clustering algorithm.
reduction: Chooses the dimensional reduction technique, with options for PCA or Harmony.
harmony.by: Specifies the metadata column for batch correction when using Harmony.

Conditional Execution

RunBasicSeurat intelligently decides whether to re-run certain steps based on parameter changes or previous executions:

force.* Parameters: Each force parameter (e.g., force.Normalize, force.RunPCA) overrides the function's internal checks, ensuring that specific steps are executed regardless of prior results. This feature is particularly useful when parameters are adjusted or when updates to the dataset require reanalysis.

Conclusion

The RunBasicSeurat function simplifies the execution of a comprehensive scRNA-seq data analysis pipeline, incorporating advanced features such as conditional execution and batch effect integration. This function ensures that users can efficiently process their data while maintaining flexibility to adapt the analysis to specific requirements.

sessionInfo()

huayc09/SeuratExtend documentation built on July 5, 2025, 1:41 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

huayc09/SeuratExtend
SeuratExtend

In huayc09/SeuratExtend: SeuratExtend

Table of Contents

Facilitate Gene Naming Conversions {#facilitate-gene-naming-conversions}

Introduction

Functions for Gene Naming Conversions

Getting Started with Examples

Default Usage

Simplified Output

One-to-One Correspondence

Converting Gene Expression Matrices

Other Gene Naming Conversion Functions

Using Online Resources

Converting UniProt IDs to Gene Symbols

Compute Statistics Grouped by Clusters {#compute-statistics-grouped-by-clusters}

Using a Seurat Object

Using Matrices as Input

Assess Proportion of Positive Cells in Clusters {#assess-proportion-of-positive-cells-in-clusters}

Basic Usage

Adjusting the Expression Threshold

Targeting Specific Clusters

Grouping by Different Metadata

Proportions Relative to Total Cell Numbers

Logical Output for Expression

Run Standard Seurat Pipeline {#run-standard-seurat-pipeline}

Overview of the Pipeline

Using `RunBasicSeurat`

Parameters and Customization

Conditional Execution

Conclusion

R Package Documentation

Browse R Packages

We want your feedback!

huayc09/SeuratExtend SeuratExtend

In huayc09/SeuratExtend: SeuratExtend

Table of Contents

Facilitate Gene Naming Conversions {#facilitate-gene-naming-conversions}

Introduction

Functions for Gene Naming Conversions

Getting Started with Examples

Default Usage

Simplified Output

One-to-One Correspondence

Converting Gene Expression Matrices

Other Gene Naming Conversion Functions

Using Online Resources

Converting UniProt IDs to Gene Symbols

Compute Statistics Grouped by Clusters {#compute-statistics-grouped-by-clusters}

Using a Seurat Object

Using Matrices as Input

Assess Proportion of Positive Cells in Clusters {#assess-proportion-of-positive-cells-in-clusters}

Basic Usage

Adjusting the Expression Threshold

Targeting Specific Clusters

Grouping by Different Metadata

Proportions Relative to Total Cell Numbers

Logical Output for Expression

Run Standard Seurat Pipeline {#run-standard-seurat-pipeline}

Overview of the Pipeline

Using RunBasicSeurat

Parameters and Customization

Conditional Execution

Conclusion

R Package Documentation

Browse R Packages

We want your feedback!

huayc09/SeuratExtend
SeuratExtend

Using `RunBasicSeurat`