run_harmony_pipeline: Pipeline for Harmony integration
In andreaskapou/SeuratPipe: Packaging common Seurat analysis tasks

run_harmony_pipeline

R Documentation

Pipeline for Harmony integration

Description

This function implements all the analysis steps for perfoming data integration using Harmony. These include: 1. data processing, e.g. normalisation, PCA. 2. Running Harmony 3. Perform UMAP and clustering after data integration. Analysis outputs are stored in corresponding directories.

Usage

run_harmony_pipeline(
  seu_obj,
  out_dir,
  batch_id = "sample",
  npcs = c(50),
  ndims = c(30),
  res = seq(0.1, 0.3, by = 0.1),
  modules_group = NULL,
  metadata_to_plot = c("sample", "condition"),
  qc_to_plot = NULL,
  logfc.threshold = 0.5,
  min.pct = 0.25,
  only.pos = TRUE,
  topn_genes = 10,
  diff_cluster_pct = 0.1,
  pval_adj = 0.05,
  pcs_to_remove = NULL,
  obj_filename = "seu_harmony",
  force_reanalysis = TRUE,
  plot_cluster_markers = TRUE,
  max.cutoff = "q98",
  min.cutoff = NA,
  n_hvgs = 3000,
  max.iter.harmony = 50,
  seed = 1,
  label = TRUE,
  label.size = 8,
  pt.size = 1.4,
  fig.res = 200,
  cont_col_pal = NULL,
  discrete_col_pal = NULL,
  cont_alpha = c(0.1, 0.9),
  discrete_alpha = 0.9,
  pt.size.factor = 1.1,
  spatial_col_pal = "inferno",
  crop = FALSE,
  plot_spatial_markers = FALSE,
  ...
)

Arguments

`seu_obj`	Seurat object or list of Seurat objects(required).
`out_dir`	Output directory for storing analysis results.
`batch_id`	Name of batch to try and remove with data integration (required). Can also be a vector if multiple batch information are present. Should be a column name in Seurat 'meta.data'. Default is "sample". This parameter is called 'group.by.vars' in Harmony.
`npcs`	Number of principal components, can be a vector e.g. c(50, 70).
`ndims`	Top Harmony dimensions to perform UMAP and clustering, can be a vector e.g. c(50, 70).
`res`	Vector with clustering resolutions (e.g. seq(0.1, 0.6, by = 0.1)).
`modules_group`	Group of modules (named list of lists) storing features (e.g. genes) to compute module score for each identified cluster. This step can be useful for annotating the different clusters by saving dot/feature plots for each group.
`metadata_to_plot`	Vector with metadata names to plot, they should be present in the meta.data slot of the Seurat object.
`qc_to_plot`	Vector with QC names to plot, they should be present in the meta.data slot of the Seurat object.
`logfc.threshold`	Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.
`min.pct`	Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations.
`only.pos`	Only return positive markers (TRUE by default).
`topn_genes`	Top cluster marker genes to use for plotting (in heatmap and feature plots), default is 10.
`diff_cluster_pct`	Retain marker genes per cluster if their `pct.1 - pct.2 > diff_cluster_pct`, i.e. they show cluster specific expression. Set to -Inf, to ignore this additional filtering.
`pval_adj`	Adjusted p-value threshold to consider marker genes per cluster.
`pcs_to_remove`	Which PCs should be removed prior to running Harmony. Possibly due to being correlated with technical/batch effects. If NULL, all PCs are used.
`obj_filename`	Filename of the stored Seurat object, default 'seu_harmony'. Number of PCs will be added to the filename automatically.
`force_reanalysis`	Logical, if intermediate object 'seu_harmony_<>.rds' exists and force_reanalysis = FALSE, read object instead of re-running Harmony integration. Added for computing time efficiency purposes.
`plot_cluster_markers`	Logical, wheather to create feature plots with 'topn_genes' cluster markers. Added mostly to reduce number of files (and size) in analysis folders. Default is TRUE.
`max.cutoff`	Maximum cutoff values for plotting each continuous feature, e.g. gene expression levels. May specify quantile in the form of 'q##' where '##' is the quantile (eg, 'q1', 'q10').
`min.cutoff`	Maximum cutoff values for plotting each continuous feature, e.g. gene expression levels. May specify quantile in the form of 'q##' where '##' is the quantile (eg, 'q1', 'q10').
`n_hvgs`	Number of highly variable genes (HVGs) to compute, which will be used as input to PCA.
`max.iter.harmony`	Maximum number of iterations for Harmony integration.
`seed`	Set a random seed, for reproducibility.
`label`	Whether to label the clusters in 'plot_reduction' space.
`label.size`	Sets size of labels.
`pt.size`	Adjust point size for plotting.
`fig.res`	Figure resolution in ppi (see 'png' function).
`cont_col_pal`	Continuous colour palette to use, default "RdYlBu".
`discrete_col_pal`	Discrete colour palette to use, default is Hue palette (hue_pal) from 'scales' package.
`cont_alpha`	(Spatial) Controls opacity of spots. Provide as a vector specifying the min and max range of values (between 0 and 1).
`discrete_alpha`	(Spatial) Controls opacity of spots. Provide a single alpha value.
`pt.size.factor`	(Spatial) Scale the size of the spots.
`spatial_col_pal`	(Spatial) Continuous colour palette to use from viridis package to colour spots on tissue, default "inferno".
`crop`	(Spatial) Crop the plot in to focus on spots that passed QC plotted. Set to FALSE to show entire background image.
`plot_spatial_markers`	(Spatial) Logical, whether to create spatial feature plots with expression of individual genes.
`...`	Additional named parameters passed to Seurat's or Harmony functions.

Value

An updated Seurat object. Note that if multiple npcs and ndims are given, only the last setting will be returned. All analysis results are also stored on disk.