clustify: Compare scRNA-seq data to reference data.
In rnabioco/clustifyr: Classifier for Single-cell RNA-seq Using Cell Clusters

clustify

R Documentation

Compare scRNA-seq data to reference data.

Description

Compare scRNA-seq data to reference data.

Usage

clustify(input, ...)

## Default S3 method:
clustify(
  input,
  ref_mat,
  metadata = NULL,
  cluster_col = NULL,
  query_genes = NULL,
  n_genes = 1000,
  per_cell = FALSE,
  n_perm = 0,
  compute_method = "spearman",
  pseudobulk_method = "mean",
  verbose = TRUE,
  lookuptable = NULL,
  rm0 = FALSE,
  obj_out = TRUE,
  seurat_out = obj_out,
  vec_out = FALSE,
  rename_prefix = NULL,
  threshold = "auto",
  low_threshold_cell = 0,
  exclude_genes = c(),
  if_log = TRUE,
  organism = "hsapiens",
  plot_name = NULL,
  rds_name = NULL,
  expand_unassigned = FALSE,
  ...
)

## S3 method for class 'Seurat'
clustify(
  input,
  ref_mat,
  cluster_col = NULL,
  query_genes = NULL,
  n_genes = 1000,
  per_cell = FALSE,
  n_perm = 0,
  compute_method = "spearman",
  pseudobulk_method = "mean",
  use_var_genes = TRUE,
  dr = "umap",
  obj_out = TRUE,
  seurat_out = obj_out,
  vec_out = FALSE,
  threshold = "auto",
  verbose = TRUE,
  rm0 = FALSE,
  rename_prefix = NULL,
  exclude_genes = c(),
  metadata = NULL,
  organism = "hsapiens",
  plot_name = NULL,
  rds_name = NULL,
  expand_unassigned = FALSE,
  ...
)

## S3 method for class 'SingleCellExperiment'
clustify(
  input,
  ref_mat,
  cluster_col = NULL,
  query_genes = NULL,
  per_cell = FALSE,
  n_perm = 0,
  compute_method = "spearman",
  pseudobulk_method = "mean",
  use_var_genes = TRUE,
  dr = "umap",
  obj_out = TRUE,
  seurat_out = obj_out,
  vec_out = FALSE,
  threshold = "auto",
  verbose = TRUE,
  rm0 = FALSE,
  rename_prefix = NULL,
  exclude_genes = c(),
  metadata = NULL,
  organism = "hsapiens",
  plot_name = NULL,
  rds_name = NULL,
  expand_unassigned = FALSE,
  ...
)

Arguments

`input`	single-cell expression matrix or Seurat object
`...`	additional arguments to pass to compute_method function
`ref_mat`	reference expression matrix
`metadata`	cell cluster assignments, supplied as a vector or data.frame. If data.frame is supplied then `cluster_col` needs to be set. Not required if running correlation per cell.
`cluster_col`	column in metadata that contains cluster ids per cell. Will default to first column of metadata if not supplied. Not required if running correlation per cell.
`query_genes`	A vector of genes of interest to compare. If NULL, then common genes between the expr_mat and ref_mat will be used for comparision.
`n_genes`	number of genes limit for Seurat variable genes, by default 1000, set to 0 to use all variable genes (generally not recommended)
`per_cell`	if true run per cell, otherwise per cluster.
`n_perm`	number of permutations, set to 0 by default
`compute_method`	method(s) for computing similarity scores
`pseudobulk_method`	method used for summarizing clusters, options are mean (default), median, truncate (10% truncated mean), or trimean, max, min
`verbose`	whether to report certain variables chosen and steps
`lookuptable`	if not supplied, will look in built-in table for object parsing
`rm0`	consider 0 as missing data, recommended for per_cell
`obj_out`	whether to output object instead of cor matrix
`seurat_out`	output cor matrix or called seurat object (deprecated, use obj_out instead)
`vec_out`	only output a result vector in the same order as metadata
`rename_prefix`	prefix to add to type and r column names
`threshold`	identity calling minimum correlation score threshold, only used when obj_out = TRUE
`low_threshold_cell`	option to remove clusters with too few cells
`exclude_genes`	a vector of gene names to throw out of query
`if_log`	input data is natural log, averaging will be done on unlogged data
`organism`	for GO term analysis, organism name: human - 'hsapiens', mouse - 'mmusculus'
`plot_name`	name for saved pdf, if NULL then no file is written (default)
`rds_name`	name for saved rds of rank_diff, if NULL then no file is written (default)
`expand_unassigned`	test all ref clusters for unassigned results
`use_var_genes`	if providing a seurat object, use the variable genes (stored in seurat_object@var.genes) as the query_genes.
`dr`	stored dimension reduction

Value

single cell object with identity assigned in metadata, or matrix of correlation values, clusters from input as row names, cell types from ref_mat as column names

Examples

# Annotate a matrix and metadata
clustify(
    input = pbmc_matrix_small,
    metadata = pbmc_meta,
    ref_mat = cbmc_ref,
    query_genes = pbmc_vargenes,
    cluster_col = "RNA_snn_res.0.5",
    verbose = TRUE
)

# Annotate using a different method
clustify(
    input = pbmc_matrix_small,
    metadata = pbmc_meta,
    ref_mat = cbmc_ref,
    query_genes = pbmc_vargenes,
    cluster_col = "RNA_snn_res.0.5",
    compute_method = "cosine"
)

# Annotate a SingleCellExperiment object
sce <- sce_pbmc()
clustify(
    sce,
    cbmc_ref,
    cluster_col = "clusters",
    obj_out = TRUE,
    per_cell = FALSE,
    dr = "umap"
)

# Annotate a Seurat object
so <- so_pbmc()
clustify(
    so,
    cbmc_ref,
    cluster_col = "seurat_clusters",
    obj_out = TRUE,
    per_cell = FALSE,
    dr = "umap"
)

# Annotate (and return) a Seurat object per-cell
clustify(
    input = so,
    ref_mat = cbmc_ref,
    cluster_col = "seurat_clusters",
    obj_out = TRUE,
    per_cell = TRUE,
    dr = "umap"
)

rnabioco/clustifyr documentation built on June 11, 2025, 5:38 a.m.