cluster_analysis: Common clustering analysis steps

View source: R/analysis.R

cluster_analysisR Documentation

Common clustering analysis steps

Description

This function implements all the analysis steps for performing clustering on a Seurat object. These include, 1. finding neighbours in lower dimensional space (defined in 'cluster_reduction' parameter) 2. obtaining clusters, 3. identifying marker genes (NOTE: to speed up re-analysis it first checks if file with marker genes is already present, if yes reads the file instead of calling FinaAllMarkers) and 4. generating plots, which include heatmap with (scaled) expression of marker genes in each cluster, marker gene expression on feature plots (e.g. UMAP space, defined in plot_reduction' parameter), dot / feature plots with pre-computed module scores on each cluster (assumes we have first run 'module_score_analysis' function). This step could be useful for lineage annotation.

Usage

cluster_analysis(
  seu,
  dims = 1:20,
  res = seq(0.1, 0.1, by = 0.1),
  logfc.threshold = 0.5,
  min.pct = 0.25,
  only.pos = TRUE,
  topn_genes = 10,
  diff_cluster_pct = 0.1,
  pval_adj = 0.05,
  plot_dir = NULL,
  plot_cluster_markers = TRUE,
  modules_group = NULL,
  cluster_reduction = "pca",
  plot_reduction = "umap",
  max.cutoff = "q98",
  min.cutoff = NA,
  seed = 1,
  force_reanalysis = TRUE,
  label = TRUE,
  label.size = 8,
  legend.position = "right",
  pt.size = 1.4,
  cont_col_pal = NULL,
  discrete_col_pal = NULL,
  fig.res = 200,
  heatmap_downsample_cols = NULL,
  cont_alpha = c(0.1, 0.9),
  discrete_alpha = 0.9,
  pt.size.factor = 1.1,
  spatial_col_pal = "inferno",
  crop = FALSE,
  plot_spatial_markers = FALSE,
  spatial_legend_position = "top",
  ...
)

Arguments

seu

Seurat object (required).

dims

Vector denoting dimensions to use for nearest neighnors and clustering (from 'cluster_reduction' parameter below).

res

Vector with clustering resolutions (e.g. seq(0.1, 0.6, by = 0.1)).

logfc.threshold

Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.

min.pct

Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations.

only.pos

Only return positive markers (TRUE by default).

topn_genes

Top cluster marker genes to use for plot (in heatmap and feature plots), default is 10.

diff_cluster_pct

Retain marker genes per cluster if their pct.1 - pct.2 > diff_cluster_pct, i.e. they show cluster specific expression. Set to -Inf, to ignore this additional filtering.

pval_adj

Adjusted p-value threshold to consider marker genes per cluster.

plot_dir

Directory to save generated plots. If NULL, plots are not saved.

plot_cluster_markers

Logical, whether to create feature plots with 'topn_genes' cluster markers. Added mostly to reduce number of files (and size) in analysis folders. Default is TRUE.

modules_group

Group of modules (named list of lists) storing features (e.g. genes) to compute module score for each identified cluster. This step can be useful for annotating the different clusters by saving dot plots for each group. Assumes that we already have computed the modules e.g. by calling the 'module_score_analysis' function. If 'plot_dir' is NULL, no plots will be generated.

cluster_reduction

Dimensionality reduction to use for performing clustering. Default is 'pca', should be set to 'harmony' if we perform data integration.

plot_reduction

Dimensionality reduction to use for plotting functions. Default is 'umap'.

max.cutoff

Vector of maximum cutoff values for each feature, may specify quantile in the form of 'q##' where '##' is the quantile (eg, 'q1', 'q10').

min.cutoff

Vector of minimum cutoff values for each feature, may specify quantile in the form of 'q##' where '##' is the quantile (eg, 'q1', 'q10').

seed

Set a random seed, for reproducibility.

force_reanalysis

Logical, if cluster marker genes file exists and force_reanalysis = FALSE, run identification of cluster markers. Otherwise, read cluster markers from file. Added for computing time efficiency purposes.

label

Whether to label the clusters in 'plot_reduction' space.

label.size

Sets size of labels.

legend.position

Position of legend, default "right" (set to "none" for clean plot).

pt.size

Adjust point size for plotting.

cont_col_pal

Continuous colour palette to use, default "RdYlBu".

discrete_col_pal

Discrete colour palette to use, default is Hue palette (hue_pal) from 'scales' package.

fig.res

Figure resolution in ppi (see 'png' function).

heatmap_downsample_cols

If numeric, it will downsample the columns of the heatmap plot, so a large specific cluster doesn't dominate the heatmap.

cont_alpha

(Spatial) Controls opacity of spots. Provide as a vector specifying the min and max range of values (between 0 and 1).

discrete_alpha

(Spatial) Controls opacity of spots. Provide a single alpha value.

pt.size.factor

(Spatial) Scale the size of the spots.

spatial_col_pal

(Spatial) Continuous colour palette to use from viridis package to colour spots on tissue, default "inferno".

crop

(Spatial) Crop the plot in to focus on spots that passed QC. Set to FALSE to show entire background image.

plot_spatial_markers

(Spatial) Logical, whether to create spatial feature plots with expression of individual genes.

spatial_legend_position

(Spatial) Position of legend for spatial plots, default "top" (set to "none" for clean plot).

...

Additional named parameters passed to Seurat analysis and plotting functions, such as FindClusters, FindAllMarkers, DimPlot and FeaturePlot.

Value

Updated Seurat object clustered cells

Author(s)

C.A.Kapourani C.A.Kapourani@ed.ac.uk


andreaskapou/SeuratPipe documentation built on Nov. 22, 2022, 4:16 p.m.