scIdent: scIdent: Deconvolve Pseudobulk Samples from Single-Cell...

View source: R/scIdent_functions.R

scIdentR Documentation

scIdent: Deconvolve Pseudobulk Samples from Single-Cell RNA-seq Data with MIXTURE

Description

scIdent: Deconvolve Pseudobulk Samples from Single-Cell RNA-seq Data with MIXTURE

Usage

scIdent(
  SeuObj,
  clusters_metadata = NULL,
  pseudobulk = 1,
  pct = 1,
  ms_treshold = 0.09,
  sgmtx = LM22,
  cores = 10L,
  SeuratAssay = "RNA"
)

Arguments

SeuObj

A Seurat object containing single-cell RNA-seq data.

clusters_metadata

Column name in 'SeuObj@meta.data' with cluster assignments for each cell. Default is 'NULL', which uses 'seurat_clusters' variable in 'meta.data'.

pseudobulk

Number of pseudobulk samples to generate for each cluster. Default is 1.

pct

If 'pseudobulk' is greater than 1, percentage of cells to randomly select for each pseudobulk sample, between 0.1 and 0.9. Default is 1.

ms_treshold

Minimum similarity threshold for marker gene selection. Default is 0.09.

sgmtx

A matrix where rows are genes and columns are cell types, used as the molecular signature for deconvolution. Default is 'LM22'.

cores

Number of CPU cores to use for computation. Default is 10. If using Windows, it must be set to 1.

SeuratAssay

The assay to use from the Seurat object. Default is "RNA".

Details

'scIdent' aggregates raw counts of cells within each cluster to create pseudobulk samples, and deconvolves them with MIXTURE. Users can specify the number of pseudobulk samples and the percentage of cells to include in each sample. Then, deconvolution with MIXTURE is performed, paired to any molecular signature, with the default being 'LM22'. Multiple pseudobulk samples can be generated to assess the reliability of deconvolution estimations, allowing users to analyze variations among predictions.

Value

A list containing the following dataframes:

clust_idents

A data frame with one row per analyzed cluster. Includes a column "MS_cluster" (1 if the cluster is composed of cell types in the molecular signature, 0 otherwise), and columns "ident_1", "ident_2", "ident_3" for the top three absolute coefficients of identified cell types in the cluster.

clust_abs

Absolute coefficients estimated for each cluster. If 'pseudobulk' > 1, retrieves the median of calculated absolute coefficients.

clust_props

Normalized coefficients, interpretable as proportions.

sigOverlap

Percentage of genes in the molecular signature with more than 0 counts in each pseudobulk sample.

statAbs

If 'pseudobulk' > 1, a data frame holding the median, IQR, first and third quartile, minimum and maximum absolute coefficients for cell types with a median higher than 'ms_treshold'.


elmerfer/MIXTURE documentation built on Aug. 20, 2024, 8:03 p.m.