Introduction to Cell Type Enrichment Analysis with xCell 2.0

Cell type enrichment analysis and cellular deconvolution, are essential for understanding the heterogeneity of complex tissues in bulk transcriptomics data. xCell2 is an R package that developed upon the original xCell methodology (Aran, et al 2017), offering improved algorithms and enhanced performance. The key advancement in xCell 2.0 is its genericity - users can now utilize any reference, including single-cell RNA-Seq data, to train an xCell2 reference object for analysis.

This package is particularly useful for researchers working with bulk transcriptomics data who want to infer the cellular composition of their samples. By leveraging reference data from various sources, xCell 2.0 offers a flexible and powerful tool for understanding the cellular heterogeneity in complex tissues.

This vignette provides an overview of the package’s features and step-by-step guidance on: - Preparing input data - Generating custom xCell2 reference objects - Performing cell type enrichment analysis - Interpreting results and best practices

Whether you are new to cell type deconvolution or an experienced bioinformatician, this guide will help you leverage xCell2 effectively in your research.

Installation

From Bioconductor (Coming Soon)

To install xCell2 from Bioconductor, use:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
 install.packages("BiocManager")
}
BiocManager::install("xCell2")

From GitHub (Development Version)

To install the development version from GitHub, use:

if (!requireNamespace("devtools", quietly = TRUE)) {
 install.packages("devtools")
}
devtools::install_github("AlmogAngel/xCell2")

Dependencies

xCell2 relies on several Bioconductor packages. Most dependencies will be automatically installed.

After installation, load the package with:

library(xCell2)

Creating a Custom Reference with xCell2Train

One of the key features of xCell2 is the ability to create custom reference objects tailored to your specific research needs. This section will guide you through the process of generating a custom xCell2 reference object using the xCell2Train function.

Why Create a Custom Reference?

Creating a custom reference allows you to: - Incorporate cell types specific to your research area - Use the latest single-cell RNA-seq data as a reference - Adapt the tool for non-standard organisms or tissues

Preparing the Input Data

Before using xCell2Train, you need to prepare two key inputs: 1. Reference Gene Expression Matrix: - Can be generated from various platforms: microarray, bulk RNA-Seq, or single-cell RNA-Seq - Genes should be in rows, samples/cells in columns - Should be normalized to both gene length and library size. Could be in either linear or logarithmic space.

  1. Labels Data Frame: This data frame must contain information about each sample/cell in your reference. It should have four columns:
  2. "ont": Cell type ontology (e.g., "CL:0000545" or NA if not applicable)
  3. "label": Cell type name (e.g., "T-helper 1 cell")
  4. "sample": Sample/cell identifier matching column names in the reference matrix
  5. "dataset": Source dataset or subject identifier

When using a SummarizedExperiment or SingleCellExperiment object, the gene expression matrix should be stored in the "counts" slot of the assays component, and the sample metadata (equivalent to the "labels" data frame) should be stored in colData.

Example: Using DICE Dataset

Let's walk through an example using a subset of the Database of Immune Cell Expression (DICE):

# Load the demo data
data(dice_demo_ref, package = "xCell2")

# Extract reference matrix
dice_ref <- as.matrix(dice_demo_ref@assays@data$logcounts)
colnames(dice_ref) <- make.unique(colnames(dice_ref))

# Prepare labels data frame
dice_labels <- as.data.frame(dice_demo_ref@colData)
dice_labels$ont <- NA
dice_labels$sample <- colnames(dice_ref)
dice_labels$dataset <- "DICE"

Assigning Cell Type Ontology (optional but Recommended)

You can skip the following step if: - You don't want to use ontology to avoid cell type dependencies (not recommended) - You are sure that there are no cell type dependencies in your reference

To improve the quality of your custom reference, assign cell type ontologies using a controlled vocabulary:

dice_labels[dice_labels$label == "B cells", ]$ont <- "CL:0000236"
dice_labels[dice_labels$label == "Monocytes", ]$ont <- "CL:0000576"
dice_labels[dice_labels$label == "NK cells", ]$ont <- "CL:0000623"
dice_labels[dice_labels$label == "T cells, CD8+", ]$ont <- "CL:0000625"
dice_labels[dice_labels$label == "T cells, CD4+", ]$ont <- "CL:0000624"
dice_labels[dice_labels$label == "T cells, CD4+, memory", ]$ont <- "CL:0000897"

Use the xCell2GetLineage function to check cell type dependencies:

xCell2::xCell2GetLineage(labels = dice_labels, outFile = "demo_dice_dep.tsv")

Open demo_dice_dep.tsv and verify the lineage assignments. Note that "T cells, CD4+, memory" assigned as a descendant of "T cells, CD4+"

Generating the xCell2 Reference Object

With our inputs prepared, we can now create the xCell2 reference object. Simply run the following command:

set.seed(123) # (optional) For reproducibility
DICE_demo.xCell2Ref <- xCell2::xCell2Train(
 ref = dice_ref,
 labels = dice_labels,
 refType = "rnaseq"
)

Note that we set seed for reproducibility as generating pseudo-bulk samples from scRNA-Seq reference based on random sampling of cells.

Key Parameters of xCell2Train: - ref: Your prepared reference gene expression matrix or a SummarizedExperiment / SingleCellExperiment object - labels: The labels data frame you created (not needed if ref is a SummarizedExperiment or SingleCellExperiment object, as it should already be included in colData) (default: NULL) - refType: Type of reference data ("rnaseq", "array", or "sc") - useOntology: Whether to use ontological integration (default: TRUE) - numThreads: Number of threads for parallel processing

For a full list of parameters and their descriptions, refer to the xCell2Train function documentation.

Sharing Your Custom xCell2 Reference Object

Sharing your custom xCell2 reference object with the scientific community can greatly benefit researchers working in similar fields. Here’s how you can contribute:

  1. Save Your Reference Object Save your newly generated xCell2 reference object:
save(DICE_demo.xCell2Ref, file = "DICE_demo.xCell2Ref.rda")
  1. Prepare Your Reference for Sharing Ensure your reference includes:
  2. A clear description of the source data
  3. Any preprocessing steps applied
  4. The version of xCell2 used to generate it

  5. Upload to the xCell2 References Repository

  6. Visit the xCell2 References Repository
  7. Navigate to the "references" directory
  8. Click "Add file" > "Upload files" and select your .rda file

  9. Update the README To help others understand and use your reference:

  10. Return to the main page of the xCell2 References Repository
  11. Open the README.md file
  12. Click the pencil icon (Edit this file) in the top right corner
  13. Add an entry to the references table

  14. Submit a Pull Request

  15. Scroll to the bottom of the page
  16. Select "Create a new branch for this commit and start a pull request"
  17. Click "Propose changes"
  18. On the next page, click "Create pull request"

The repository maintainers will review your submission and merge it if everything is in order. By sharing your custom reference, you contribute to the growth and improvement of cell type enrichment analysis across the scientific community. Your work could help researchers in related fields achieve more accurate and relevant results in their studies.

Next Steps

After creating your custom reference, you can use it for cell type enrichment analysis with xCell2Analysis. We'll cover this in the next section. Remember, creating a robust reference is crucial for accurate results. Take time to ensure your input data is high-quality and properly annotated.

Using Pre-trained xCell2 References

xCell2 offers pre-trained reference objects that can be easily downloaded and used for your analysis. These references cover various tissue types and are based on well-curated datasets.

| Dataset | Study | Species | Normalization | nSamples/Cells | nCellTypes | Platform | Tissues | |---------|-------|---------|---------------|----------------|------------|----------|---------| | BlueprintEncode | Martens JHA and Stunnenberg HG (2013), The ENCODE Project Consortium (2012), Aran D (2019) | Homo Sapiens | TPM | 259 | 43 | RNA-seq | Mixed | | ImmGenData | The Immunological Genome Project Consortium (2008), Aran D (2019) | Mus Musculus | RMA | 843 | 19 | Microarray | Immune/Blood | | Immune Compendium | Zaitsev A (2022) | Homo Sapiens | TPM | 3626 | 40 | RNA-seq | Immune/Blood | | LM22 | Chen B (2019) | Homo Sapiens | RMA | 113 | 22 | Microarray | Mixed | | MouseRNAseqData | Benayoun B (2019) | Mus Musculus | TPM | 358 | 18 | RNA-seq | Mixed | | Pan Cancer | Nofech-Mozes I (2023) | Homo Sapiens | Counts | 25084 | 29 | scRNA-seq | Tumor | | Tabula Muris Blood | The Tabula Muris Consortium (2018) | Mus Musculus | Counts | 11145 | 6 | scRNA-seq | Bone Marrow, Spleen, Thymus | | Tabula Sapiens Blood | The Tabula Sapiens Consortium (2022) | Homo Sapiens | Counts | 11921 | 18 | scRNA-seq | Blood, Lymph_Node, Spleen, Thymus, Bone Marrow | | TME Compendium | Zaitsev A (2022) | Homo Sapiens | TPM | 8146 | 25 | RNA-seq | Tumor |

You can also quick access popular pre-trained references that are available within the xCell2 package:

data(BlueprintEncode.xCell2Ref)

Or download a pre-trained reference directly within R using the download.file() function:

# Set the URL of the pre-trained reference
ref_url <- "https://dviraran.github.io/xCell2refs/references/BlueprintEncode.xCell2Ref.rds"
# Set the local filename to save the reference
local_filename <- "BlueprintEncode.xCell2Ref.rds"
# Download the file
download.file(ref_url, local_filename, mode = "wb")
# Load the downloaded reference
BlueprintEncode.xCell2Ref <- readRDS(local_filename)

Remember to choose a reference that's appropriate for your specific tissue type and experimental context. The choice of reference can impact your results, so it's important to select one that closely matches your biological system.

Performing Cell Type Enrichment Analysis with xCell2Analysis

After creating or obtaining an xCell2 reference object, the next step is to use it for cell type enrichment analysis on your bulk RNA-seq data. This section will guide you through using the xCell2Analysis function and interpreting its results.

Preparing Your Data

Before running the analysis, ensure you have: 1. An xCell2 reference object 2. A bulk gene expression matrix to analyze

For this example, we'll use a pre-loaded demo reference and a sample bulk expression dataset:

# Load the demo reference object
data(DICE_demo.xCell2Ref, package = "xCell2")
# Load a sample bulk expression dataset
data(mix_demo, package = "xCell2")

Running xCell2Analysis

Now, let's perform the cell type enrichment analysis:

xcell2_results <- xCell2::xCell2Analysis(
 mix = mix_demo,
 xcell2object = DICE_demo.xCell2Ref
)

Key Parameters: - mix: Your bulk mixture gene expression data (genes in rows, samples in columns) - xcell2object: An S4 object of class xCell2Object (your reference) - minSharedGenes: Minimum fraction of shared genes required (default: 0.9) - spillover: Whether to use spillover correction (default: TRUE) - numThreads: Number of threads for parallel processing (default: 1)

For a full list of parameters and their descriptions, refer to the xCell2Analysis function documentation.

Interpreting the Results

The xCell2Analysis function returns a matrix of cell type enrichment scores: - Rows represent cell types - Columns represent samples from your input mixture - Higher scores indicate a stronger presence of that cell type in the sample

Important considerations: - Scores are relative, not absolute proportions - Compare scores across samples to identify differences in cell type composition - Consider the biological context of your samples when interpreting results

Further Analysis

Once you have your xCell2 results, you can: - Correlate cell type enrichment scores with clinical or experimental variables - Perform differential enrichment analysis between sample groups - Use the scores as features for machine learning models

Remember, xCell2 provides estimates of relative cell type abundance. For absolute quantification, additional experimental validation may be necessary.

Troubleshooting

If you encounter issues: - Ensure your input data is properly formatted - Check that your mix and reference use the same gene annotation system - Try adjusting the minSharedGenes parameter if many genes are missing

For more detailed troubleshooting, refer to the package documentation or seek help on the xCell2 GitHub issues page.

Citing xCell2

If you use xCell2 in your research, please cite: Angel A, Naom L, Nabel-Levy S, Aran D. xCell 2.0: Robust Algorithm for Cell Type Proportion Estimation Predicts Response to Immune Checkpoint Blockade. bioRxiv 2024.

Referece

R Session Info

sessionInfo()


AlmogAngel/xCell2 documentation built on Oct. 14, 2024, 4:51 a.m.