iterative_differential_clustering: iterative_differential_clustering

View source: R/IDclust.R

iterative_differential_clusteringR Documentation

iterative_differential_clustering

Description

Main function of the IDclust package. Provided a SingleCellExperiment pre-processed with ChromSCape, will find biologically relevant clusters by iteratively re-clustering and re-processing clusters. At each iteration, subclusters having enough significantly enriched features compared to other subclusters are defined as 'true' subclusters. Others are assigned to parent clusters. The algorithm will stop when no more 'true' subclusters are found.

This method ensure that each cluster found in this unsupervised way have significant biological differences, based on the user defined thresholds.

Usage

iterative_differential_clustering(object, ...)

## Default S3 method:
iterative_differential_clustering(
  object,
  output_dir = "./",
  plotting = TRUE,
  saving = TRUE,
  n_dims = 50,
  dim_red = "PCA",
  vizualization_dim_red = "UMAP",
  processing_function = processing_ChromSCape,
  min.pct = NULL,
  differential_function = differential_ChromSCape,
  logFC.th = log2(2),
  qval.th = 0.01,
  min_frac_cell_assigned = 0.1,
  limit = 5,
  starting.k = 100,
  starting.resolution = 0.1,
  resolution = 0.1,
  max_k = 50,
  k_percent = 0.1,
  FP_linear_model = NULL,
  color = NULL,
  d = 10,
  swapExperiment = NULL,
  force_initial_clustering = TRUE,
  verbose = TRUE,
  ...
)

## S3 method for class 'Seurat'
iterative_differential_clustering(
  object,
  output_dir = "./",
  plotting = TRUE,
  saving = TRUE,
  n_dims = 50,
  dim_red = "pca",
  vizualization_dim_red = "umap",
  processing_function = processing_Seurat,
  differential_function = differential_edgeR_pseudobulk_LRT,
  logFC.th = log2(2),
  qval.th = 0.01,
  min_frac_cell_assigned = 0.1,
  limit = 5,
  starting.resolution = 0.1,
  starting.k = 100,
  resolution = 0.1,
  max_k = 50,
  k_percent = 0.1,
  color = NULL,
  force_initial_clustering = TRUE,
  verbose = TRUE,
  ...
)

Arguments

object

A Seurat object preprocessed with Seurat.

...

Additional parameters passed to the differential_function. See differential_edgeR_pseudobulk_LRT() for more information on additional parameters for the default function.

output_dir

The output directory in which to plot and save objects.

plotting

A logical specifying wether to save the plots or not.

saving

A logical specifying wether to save the data or not.

n_dims

An integer specifying the number of first dimensions to keep in the dimensionality reduction step.

dim_red

The name of the slot to save the dimensionality reduction at each step in the Seurat::Reductions(object).

vizualization_dim_red

The name of the slot used for plotting. Must be a valid slot present in Seurat::Reductions(object).

processing_function

A function that re-process the subset of clusters at each step. It msut take in entry a Seurat object, dim_red and n_dims parameters and returns a Seurat containing a cell embedding. See processing_Seurat for the default function.

min.pct

A numeric between 0 and 1 specifying the fraction of cells active in a cluster for a feature to be defined as marker. Default to NULL, if NULL, the 70th percentile of global activation is taken as minimal percentage of activation for the differential analysis. Increasing this value will decrease the number of differential features.

differential_function

A function that take in entry a SingleCellExperiment object and parameters passed in ... and returns a data.frame containing the significantly differential features for each cluster. See differential_edgeR_pseudobulk_LRT for the default function.

logFC.th

A numeric specifying the log2 fold change of activation above/below which a feature is considered as significantly differential passed to the differential_function.

qval.th

A numeric specifying the adjusted p-value below which a feature is considered as significantly differential passed to the differential_function.

min_frac_cell_assigned

A numeric between 0 and 1 specifying the minimum percentage of the total cells in the SingleCellExperiment object that needs to be assigned. If a lower proportion is assigned, all cells are assigned to the cluster of origin.

limit

An integer specifying the minimum number of significantly enriched / depleted features required in order for a subcluster to be called a 'true' subcluster

starting.k

An integer specifying the number of nearest neighbors to use for the Louvain clustering of the first iteration. It is recommended to set it quite high in order to have few starting clusters

starting.resolution

A numeric specifying the resolution to use for the Louvain clustering of the first iteration. It is recommended to set it quite low in order to have few starting clusters.

resolution

A numeric specifying the resolution to use for the Louvain clustering at each iteration.

max_k

An integer specifying the maximum number of nearest neighbors to use for the Louvain clustering at each iteration. This k is reduced with the number of cells, to a minimum of k = 5.

k_percent

A numeric between 0 and 1 representing the fraction of cells to calculate the k for the KNN graph calulation of clustering.

FP_linear_model

Optional. A linear model (see stats::lm()) of the number of false positive expected for a given cluster size. The lm_list list of linear models present in this package gives default values accross multiple binsizes. (See calculate_FDR_scEpigenomics).

color

Set of colors to use for the coloring of the clusters. This must contains enough colors for each cluster (minimum 20 colors, but 100 colors at least is recommended, based on the dataset).

swapExperiment

A character specifying an alternative experiment (see SingleCellExperiment::altExp()) to switch for differential analysis. The processing will be done in the main experiment while the differential analysis will be done in the alternative experiment.

force_initial_clustering

A logical specifying wether to force the initial number of cluster between 2 and 6. This is in order to avoid a too high number of initial clusters which would be equivalent to a classical louvain clustering.

verbose

A logical specifying wether to print.

Details

The default differential analysis used is the ChromSCape::differential_activation() function. This function compares the % of active cells in the cluster versus the rest of cells and perform a Chi-squared test to calculate p-values.

Value

The SingleCellExperiment object with the assignation of cells to clusters. If saving is true, also saves list of differential analyses, differential analyses summaries and embeddings for each re-clustered cluster. If runFDR is TRUE, also saves the list of FDR for each re-clusterd cluster.

The Seurat object with the assignation of cells to clusters. If saving is true, also saves list of differential analyses, differential analyses summaries and embeddings for each re-clustered cluster.

Examples

# Clustering of Seurat scRNA object (Paired-Tag)
if(requireNamespace("Seurat", quietly=TRUE)){

data("Seu", package = "IDclust")
set.seed(47)
Seu = iterative_differential_clustering(Seu, saving = FALSE, plotting =FALSE,
logFC.th = 0.2, qval.th = 0.1)

}

# Clustering of scExp scH3K27ac object (Paired-Tag)
if(requireNamespace("ChromSCape", quietly=TRUE)){

data("scExp", package = "IDclust")
set.seed(47)
scExp = iterative_differential_clustering(scExp, saving = FALSE, plotting =FALSE,
logFC.th = 0.5, qval.th = 0.01)

}

vallotlab/IDclust documentation built on Feb. 16, 2023, 8:58 a.m.