run_cluster_pipeline: Pipeline for clustering analysis

View source: R/pipeline.R

run_cluster_pipelineR Documentation

Pipeline for clustering analysis

Description

This function wraps the most common Seurat analysis pipeline for cell type identification. These include: 1. data processing, e.g. normalisation. 2. Running PCA. 3. Perform UMAP and clustering. Analysis outputs are stored in corresponding directories. Note this pipeline requires a single Seurat object/sample.

Usage

run_cluster_pipeline(
  seu_obj,
  out_dir,
  npcs = c(50),
  ndims = c(30),
  res = seq(0.1, 0.3, by = 0.1),
  modules_group = NULL,
  metadata_to_plot = c("sample", "condition"),
  qc_to_plot = NULL,
  logfc.threshold = 0.5,
  min.pct = 0.25,
  only.pos = TRUE,
  topn_genes = 10,
  diff_cluster_pct = 0.1,
  pval_adj = 0.05,
  pcs_to_remove = NULL,
  plot_cluster_markers = TRUE,
  max.cutoff = "q98",
  min.cutoff = NA,
  n_hvgs = 3000,
  seed = 1,
  label = TRUE,
  label.size = 8,
  pt.size = 1.4,
  fig.res = 200,
  cont_col_pal = NULL,
  discrete_col_pal = NULL,
  cont_alpha = c(0.1, 0.9),
  discrete_alpha = 0.9,
  pt.size.factor = 1.1,
  spatial_col_pal = "inferno",
  crop = FALSE,
  plot_spatial_markers = FALSE,
  ...
)

Arguments

seu_obj

Seurat object (required).

out_dir

Output directory for storing analysis results.

npcs

Number of principal components, can be a vector e.g. c(50, 70).

ndims

Top PCA dimensions to perform UMAP and clustering, can be a vector e.g. c(50, 70).

res

Vector with clustering resolutions (e.g. seq(0.1, 0.6, by = 0.1)).

modules_group

Group of modules (named list of lists) storing features (e.g. genes) to compute module score for each identified cluster. This step can be useful for annotating the different clusters by saving dot/feature plots for each group.

metadata_to_plot

Vector with metadata names to plot, they should be present in the meta.data slot of the Seurat object.

qc_to_plot

Vector with QC names to plot, they should be present in the meta.data slot of the Seurat object.

logfc.threshold

Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.

min.pct

Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations.

only.pos

Only return positive markers (TRUE by default).

topn_genes

Top cluster marker genes to use for plotting (in heatmap and feature plots), default is 10.

diff_cluster_pct

Retain marker genes per cluster if their pct.1 - pct.2 > diff_cluster_pct, i.e. they show cluster specific expression. Set to -Inf, to ignore this additional filtering.

pval_adj

Adjusted p-value threshold to consider marker genes per cluster.

pcs_to_remove

Which PCs should be removed prior to performing clustering. Possibly due to being correlated with technical/batch effects. If NULL, all PCs are used.

plot_cluster_markers

Logical, wheather to create feature plots with 'topn_genes' cluster markers. Added mostly to reduce number of files (and size) in analysis folders. Default is TRUE.

max.cutoff

Maximum cutoff values for plotting each continuous feature, e.g. gene expression levels. May specify quantile in the form of 'q##' where '##' is the quantile (eg, 'q1', 'q10').

min.cutoff

Maximum cutoff values for plotting each continuous feature, e.g. gene expression levels. May specify quantile in the form of 'q##' where '##' is the quantile (eg, 'q1', 'q10').

n_hvgs

Number of highly variable genes (HVGs) to compute, which will be used as input to PCA.

seed

Set a random seed, for reproducibility.

label

Whether to label the clusters in 'plot_reduction' space.

label.size

Sets size of labels.

pt.size

Adjust point size for plotting.

fig.res

Figure resolution in ppi (see 'png' function).

cont_col_pal

Continuous colour palette to use, default "RdYlBu".

discrete_col_pal

Discrete colour palette to use, default is Hue palette (hue_pal) from 'scales' package.

cont_alpha

(Spatial) Controls opacity of spots. Provide as a vector specifying the min and max range of values (between 0 and 1).

discrete_alpha

(Spatial) Controls opacity of spots. Provide a single alpha value.

pt.size.factor

(Spatial) Scale the size of the spots.

spatial_col_pal

(Spatial) Continuous colour palette to use from viridis package to colour spots on tissue, default "inferno".

crop

(Spatial) Crop the plot in to focus on spots that passed QC plotted. Set to FALSE to show entire background image.

plot_spatial_markers

(Spatial) Logical, whether to create spatial feature plots with expression of individual genes.

...

Additional named parameters passed to Seurat functions.

Value

An updated Seurat object. Note that if multiple npcs and ndims are given, only the last setting will be returned. All analysis results are also stored on disk.

Author(s)

C.A.Kapourani C.A.Kapourani@ed.ac.uk


andreaskapou/SeuratPipe documentation built on Nov. 22, 2022, 4:16 p.m.