run_interaction_analysis: Run (differential) intercellular communication analysis

View source: R/interaction_analysis.R

run_interaction_analysisR Documentation

Run (differential) intercellular communication analysis

Description

Perform (differential) cell type to cell type communication analysis from a Seurat object, using an internal database of ligand-receptor interactions (LRIs). It infers biologically relevant cell-cell interactions (CCIs) and how they change between two conditions of interest. Over-representation analysis is automatically performed to determine dominant differential signals at the level of the genes, cell types, GO Terms and KEGG Pathways.

Usage

run_interaction_analysis(
  seurat_object,
  LRI_species,
  seurat_celltype_id,
  seurat_condition_id,
  iterations = 1000,
  scdiffcom_object_name = "scDiffCom_object",
  seurat_assay = "RNA",
  seurat_slot = "data",
  log_scale = FALSE,
  score_type = "geometric_mean",
  threshold_min_cells = 5,
  threshold_pct = 0.1,
  threshold_quantile_score = 0.2,
  threshold_p_value_specificity = 0.05,
  threshold_p_value_de = 0.05,
  threshold_logfc = log(1.5),
  return_distributions = FALSE,
  seed = 42,
  verbose = TRUE
)

Arguments

seurat_object

Seurat object that must contain normalized data and relevant meta.data columns (see below). Gene names must be MGI (mouse) or HGNC (human) approved symbols.

LRI_species

Either "mouse", "human" or "rat". Indicates which LRI database to use and corresponds to the species of the seurat_object.

seurat_celltype_id

Name of the meta.data column in seurat_object that contains cell-type annotations (e.g.: "CELL_TYPE").

seurat_condition_id

List that contains information regarding the two conditions on which to perform differential analysis. Must contain the following three named items:

  1. column_name: name of the meta.data column in seurat_object that indicates the condition on each cell (e.g. "AGE")

  2. cond1_name: name of the first condition (e.g. "YOUNG")

  3. cond2_name: name of the second condition (e.g. "OLD")

Can also be set to NULL to only perform a detection analysis (see Details).

iterations

Number of permutations to perform the statistical analysis. The default (1000) is a good compromise for an exploratory analysis and to obtain reasonably accurate p-values in a short time. Otherwise, we recommend using 10000 iterations and to run the analysis in parallel (see Details). Can also be set to 0 for debugging and quickly returning partial results without statistical significance.

scdiffcom_object_name

Name of the scDiffCom S4 object that will be returned ("scDiffCom_object" by default).

seurat_assay

Assay of seurat_object from which to extract data. See Details for an explanation on how data are extracted based on the three parameters seurat_assay, seurat_slot and log_scale.

seurat_slot

Slot of seurat_object from which to extract data. See Details for an explanation on how data are extracted based on the three parameters seurat_assay, seurat_slot and log_scale.

log_scale

When FALSE (the default, recommended), data are treated as normalized but not log1p-transformed. See Details for an explanation on how data are extracted based on the three parameters seurat_assay, seurat_slot and log_scale.

score_type

Metric used to compute cell-cell interaction (CCI) scores. Can either be "geometric_mean" (default) or "arithmetic_mean". It is strongly recommended to use the geometric mean, especially when performing differential analysis. The arithmetic mean might be used when uniquely doing a detection analysis or if the results want to be compared with those of another package.

threshold_min_cells

Minimal number of cells - of a given cell type and condition - required to express a gene for this gene to be considered expressed in the corresponding cell type. Incidentally, cell types with less cells than this threshold are removed from the analysis. Set to 5 by default.

threshold_pct

Minimal fraction of cells - of a given cell type and condition - required to express a gene for this gene to be considered expressed in the corresponding cell type. Set to 0.1 by default.

threshold_quantile_score

Threshold value used in conjunction with threshold_p_value_specificity to establish if a CCI is considered "detected". The default (0.2) indicates that CCIs with a score in the 20% lowest-scores are not considered detected. Can be modified without the need to re-perform the permutation analysis (see Details).

threshold_p_value_specificity

Threshold value used in conjunction with threshold_quantile_score to establish if a CCI is considered "detected". CCIs with a (BH-adjusted) specificity p-value above the threshold (0.05 by default) are not considered detected. Can be modified without the need to re-perform the permutation analysis (see Details).

threshold_p_value_de

Threshold value used in conjunction with threshold_logfc to establish how CCIs are differentially expressed between cond1_name and cond2_name. CCIs with a (BH-adjusted) differential p-value above the threshold (0.05 by default) are not considered to change significantly. Can be modified without the need to re-perform the permutation analysis (see Details).

threshold_logfc

Threshold value used in conjunction with threshold_p_value_de to establish how CCIs are differentially expressed between cond1_name and cond2_name. CCIs with an absolute logFC below the threshold (log(1.5) by default) are considered "FLAT". Can be modified without the need to re-perform the permutation analysis (see Details).

return_distributions

FALSE by default. If TRUE, the distributions obtained from the permutation test are returned alongside the other results. May be used for testing or benchmarking purposes. Can only be enabled when iterations is less than 1000 in order to avoid out of memory issues.

seed

Set a random seed (42 by default) to obtain reproducible results.

verbose

If TRUE (default), print progress messages.

Details

The primary use of this function (and of the package) is to perform differential intercellular communication analysis. However, it is also possible to only perform a detection analysis (by setting seurat_condition_id to NULL), e.g. if one wants to infer cell-cell interactions from a dataset without having conditions on the cells.

By convention, when performing differential analysis, LOGFC are computed as log(score(cond2_name)/score(cond1_name)). In other words, "UP"-regulated CCIs have a larger score in cond2_name.

Parallel computing. If possible, it is recommended to run this function in parallel in order to speed up the analysis for large dataset and/or to obtain better accuracy on the p-values by setting a higher number of iterations. This is as simple as loading the future package and setting an appropriate plan (see also our vignette).

Data extraction. The UMI or read counts matrix is extracted from the assay seurat_assay and the slot seurat_slot. By default, it is assumed that seurat_object contains log1p-transformed normalized data in the slot "data" of its assay "RNA". If log_scale is FALSE (as recommended), the data are expm1() transformed in order to recover normalized values not in log scale.

Modifying filtering parameters (differential analysis only). As long as the slot cci_table_raw of the returned scDiffCom object is not erased, filtering parameters can be modified to recompute the slots cci_table_detected and ora_table, without re-performing the time consuming permutation analysis. This may be useful if one wants a fast way to analyze how the results behave in function of, say, different LOGFC thresholds. In practice, this can be done by calling the functions FilterCCI or RunORA (see also our vignette).

Value

An S4 object of class scDiffCom-class.

Examples

## Not run: 
run_interaction_analysis(
  seurat_object = seurat_sample_tms_liver,
  LRI_species = "mouse",
  seurat_celltype_id = "cell_type",
  seurat_condition_id = list(
    column_name = "age_group",
    cond1_name = "YOUNG",
    cond2_name = "OLD"
  )
)

## End(Not run)

scDiffCom documentation built on Nov. 4, 2023, 1:06 a.m.