CellDEEP Quick Start
In CellDEEP: Cell DiffErential Expression by Pooling ('CellDEEP')

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

What CellDEEP does

CellDEEP reduces scRNA-seq sparsity by pooling cells into pseudocells before DE testing.

Load package and example data

library(CellDEEP)
data("sim")

Step 1: Run DE directly with FindMarker.CellDEEP

FindMarker.CellDEEP includes metadata preparation internally. Key parameters to set: - group_id, sample_id, cluster_id: metadata column names in your Seurat object - ident.1, ident.2: two groups to compare - cell_selection: how to select cells for pooling ("kmean" or "random") - readcounts: how to aggregate counts in pooled cells ("sum" or "mean") - min_cells_per_subgroup: minimum cells required in each sample-cluster subgroup for pooling

de.test <- FindMarker.CellDEEP(
  sim,
  group_id = "Status",
  sample_id = "DonorID",
  cluster_id = "cluster_id",
  Pool = TRUE,
  test.use = "wilcox",
  n_cells = 3,
  min_cells_per_subgroup = 1,
  cell_selection = "random",
  readcounts = "sum",
  logfc.threshold = 0.25,
  ident.1 = "Case",
  ident.2 = "Control"
)

Step 2: Pool cells only (optional)

Use these functions if you want pooled objects without running DE immediately.

min_cells_per_subgroup means the minimum number of cells required in each sample_id x cluster_id subgroup before pooling is performed.

Pooling functions use standardized metadata fields (sample_id, group_id, cluster_id), so prepare once before pooling:

pool_input <- prepare_data(
  sim,
  sample_id = "DonorID",
  group_id = "Status",
  cluster_id = "cluster_id"
)

K-means pooling

pooled_kmean <- CellDEEP.Kmean(
  pool_input,
  readcounts = "sum",
  n_cells = 3,
  min_cells_per_subgroup = 1,
  assay_name = "RNA"
)
pooled_kmean

Random pooling

pooled_random <- CellDEEP.Random(
  pool_input,
  readcounts = "sum",
  n_cells = 5,
  min_cells_per_subgroup = 1,
  assay_name = "RNA"
)
pooled_random

If no genes pass the adjusted p-value filter in this small example dataset, try a larger dataset or set full_list = TRUE.