xCell2Train: Train Custom xCell2 Reference Object

View source: R/xCell2Train.R

xCell2TrainR Documentation

Train Custom xCell2 Reference Object

Description

This function creates a custom reference object for xCell2Analysis, enabling cell type enrichment analysis. It supports references derived from RNA-Seq, microarray, and scRNA-Seq data and can be derived from various tissues and organisms.

Usage

xCell2Train(
  ref,
  mix = NULL,
  labels = NULL,
  refType,
  lineageFile = NULL,
  BPPARAM = BiocParallel::SerialParam(),
  useOntology = TRUE,
  returnSignatures = FALSE,
  returnAnalysis = FALSE,
  useSpillover = TRUE,
  spilloverAlpha = 0.5,
  minPbCells = 30,
  minPbSamples = 10,
  minScGenes = 10000
)

Arguments

ref

A reference gene expression matrix (genes in rows, samples/cells in columns) or a SummarizedExperiment/SingleCellExperiment object with expression data in the assays slot.

Valid Assays:

"tpm"

Transcripts Per Million (recommended for RNA-Seq).

"logcounts"

Log-transformed normalized counts.

"normcounts"

Normalized counts.

"counts"

Raw counts (required for microarray references).

Notes:

  • If multiple assays exist, "tpm" is prioritized.

  • For microarray data, the "counts" assay must be used.

mix

A bulk mixture of gene expression matrix (genes in rows, samples in columns) (optional). This parameter is required if returnAnalysis is set to TRUE, as it is used for enrichment analysis.

labels

A data frame with the following columns:

  • "ont": The cell type ontology ID (e.g., "CL:0000545"). Set to NA if not available. Ontologies can be found at EBI Ontology Lookup Service (OLS) or by using the ontologyIndex package.

  • "label": The cell type name (e.g., "T-helper 1 cell").

  • "sample": The sample or cell identifier, matching column names in the reference matrix.

  • "dataset": The dataset source for each sample. If not applicable, use a constant value for all samples.

This parameter is unnecessary if ref is a SummarizedExperiment or SingleCellExperiment object, as metadata should be in colData.

refType

The type of reference data: "rnaseq" for RNA-Seq, "array" for microarray, or "sc" for scRNA-Seq.

lineageFile

Path to a manually curated cell type lineage file generated with xCell2GetLineage (optional).

BPPARAM

A BiocParallelParam instance that determines the parallelization strategy (more in "Details"). Default is BiocParallel::SerialParam().

useOntology

A Boolean indicating whether to use ontological integration for cell type dependencies (default: TRUE). Lineage relationships are determined using the Cell Ontology (CL). Users can refine these dependencies with xCell2GetLineage and provide them via the lineageFile parameter.

returnSignatures

A Boolean to return only cell type signatures (default: FALSE).

returnAnalysis

A Boolean to return xCell2Analysis results instead of a reference object (default: FALSE).

useSpillover

A Boolean to use spillover correction during analysis when returnAnalysis is TRUE (default: TRUE). Spillover correction enhances the specificity of enrichment scores by accounting for overlaps between cell types.

spilloverAlpha

Numeric value controlling spillover correction strength (default: 0.5). Lower values apply weaker correction, while higher values apply stronger correction.

minPbCells

Minimum number of cells in a pseudo-bulk sample for scRNA-Seq references (default: 30).

minPbSamples

Minimum number of pseudo-bulk samples for scRNA-Seq references (default: 10).

minScGenes

Minimum number of genes for pseudo-bulk samples for scRNA-Seq references (default: 1e4).

Details

Ontological Integration: Ontological integration (useOntology) leverages hierarchical cell type relationships to ensure biologically meaningful signatures. Dependencies can be refined using xCell2GetLineage, which generates lineage files for manual review.

Spillover Correction: Spillover correction enhances the specificity of enrichment scores by reducing overlaps between related cell types. Use the spilloverAlpha parameter to tune the strength of correction.

Contribute Your xCell2 Reference Object: Users are encouraged to share their reference objects via the xCell2 Reference Repository.

Value

An xCell2Object containing:

  • signatures: Cell type-specific gene signatures.

  • dependencies: Lineage-based dependencies.

  • params: Linear transformation parameters.

  • spill_mat: A spillover correction matrix.

  • genes_used: Genes used for training.

Author(s)

Almog Angel and Dvir Aran

See Also

xCell2Analysis, for enrichment analysis. xCell2GetLineage, for refining cell type dependencies.

Examples

library(xCell2)
data(dice_demo_ref, package = "xCell2")
dice_ref <- SummarizedExperiment::assay(dice_demo_ref, "logcounts")
colnames(dice_ref) <- make.unique(colnames(dice_ref))
dice_labels <- as.data.frame(SummarizedExperiment::colData(dice_demo_ref))
dice_labels$ont <- NA
dice_labels$sample <- colnames(dice_ref)
dice_labels$dataset <- "DICE"
DICE.xCell2Ref <- xCell2::xCell2Train(ref = dice_ref, labels = dice_labels, refType = "rnaseq")


AlmogAngel/xCell2 documentation built on Jan. 3, 2025, 2:03 a.m.