create_domino: Create a domino object and prepare it for network...

View source: R/import_fxns.R

create_dominoR Documentation

Create a domino object and prepare it for network construction

Description

This function reads in a receptor ligand signaling database, cell level features of some kind (ie. output from pySCENIC), z-scored single cell data, and cluster id for single cell data, calculates a correlation matrix between receptors and other features (this is transcription factor module scores if using pySCENIC), and finds features enriched by cluster. It will return a domino object prepared for build_domino, which will calculate a signaling network.

Usage

create_domino(
  signaling_db,
  features,
  ser = NULL,
  counts = NULL,
  z_scores = NULL,
  clusters = NULL,
  use_clusters = TRUE,
  df = NULL,
  gene_conv = NULL,
  verbose = TRUE,
  use_complexes = TRUE,
  rec_min_thresh = 0.025,
  remove_rec_dropout = TRUE,
  tf_selection_method = "clusters",
  tf_variance_quantile = 0.5
)

Arguments

signaling_db

Path to directory of signaling database directory. The directory must include genes.csv, proteins.csv, interactions.csv, and complexes.csv formated according to cellphonedb2 syntax.

features

Either a path to a csv containing cell level features of interest (ie. the auc matrix from pySCENIC) or named matrix with cells as columns and features as rows.

ser

A Seurat object containing scaled RNA expression data in the RNA assay slot and cluster identity. Either a ser object OR z_scores and clusters must be provided. If ser is present z_scores and clusters will be ignored.

counts

The counts matrix for the data. If a Seurat object is provided this will be ignored. This is only used to threshold receptors on dropout.

z_scores

A matrix containing z-scored expression data for all cells with cells as columns and features as rows. Either z_scores and clusters must be provided OR a ser object. If ser is present z_scores and clusters will be ignored.

clusters

A named factor containing cell cluster with names as cells. Either clusters and z_scores OR ser must be provided. If ser is present z_scores and clusters will be ignored.

use_clusters

Boolean indicating whether to use the clusters from a Seurat object. If a Seurat object is not provided then this parameter is ignored.

df

Optional. Either a path to discovered motifs from pySCENIC as a csv file or a data frame following the format of df.csv from pySCENIC

gene_conv

Optional. Vector of length two containing some combination of 'ENSMUSG', 'ENSG', 'MGI', or 'HGNC' where the first vector is the current gene format in the database and the second is the gene format in the data set. If present, the function will use biomaRt to convert the database to the data sets gene format.

verbose

Boolean indicating whether or not to print progress during computation.

use_complexes

Boolean indicating whether you wish to use receptor/ligand complexes in the receptor ligand signaling database. This may lead to problems if genes which are preserved acrossed many functionally different signaling complexes are found highly expressed or correlated with features in your data set.

rec_min_thresh

Minimum expression level of receptors by cell. Default is 0.025 or 2.5 percent of all cells in the data set. This is important when calculating correlation to connect receptors to transcription activation. If this threshold is too low then correlation calculations will proceed with very few cells with non-zero expression.

remove_rec_dropout

Whether to remove receptors with 0 expression counts when calculating correlations. This can reduce false positive correlation calculations when receptors have high dropout rates.

tf_selection_method

Selection of which method to target transcription factors. If 'clusters' then differential expression for clusters will be calculated. If 'variable' then the most variable transcription factors will be selected. If 'all' then all transcription factors in the feature matrix will be used. Default is 'clusters'. Note that if you wish to use clusters for intercellular signaling downstream to MUST choose clusters.

tf_variance_quantile

What proportion of variable features to take if using variance to threshold features. Default is 0.5. Higher numbers will keep more features. Ignored if tf_selection_method is not 'variable'

Value

A domino object.


Chris-Cherry/domino documentation built on Dec. 9, 2024, 12:28 a.m.