navigate_space: Perform spatial prioritization of the response to a...

View source: R/navigate_space.R

navigate_spaceR Documentation

Perform spatial prioritization of the response to a biological perturbation

Description

Prioritize spatial locations involved in a complex biological process by training a machine-learning model to predict sample labels (e.g., disease vs. control, treated vs. untreated, or time post-stimulus), and evaluate the performance of the model in cross-validation.

Usage

navigate_space(
  input,
  meta = NULL,
  coords = NULL,
  k = 50,
  label_col = "label",
  coord_cols = c("coord_x", "coord_y"),
  n_subsamples = 50,
  subsample_size = 20,
  folds = 3,
  var_quantile = 0.5,
  feature_perc = 0.5,
  n_threads = 32,
  show_progress = T,
  augur_mode = c("default", "velocity"),
  classifier = c("rf", "lr"),
  rf_params = list(trees = 100, mtry = 2, min_n = NULL, importance = "accuracy"),
  lr_params = list(mixture = 1, penalty = "auto")
)

Arguments

input

a matrix, data frame, or Seurat object containing gene expression values (genes in rows, cells in columns) and, optionally, metadata about each spatial barcode

meta

optionally, a data frame containing metadata about the input gene-by-barcode matrix, at minimum containing the label associated with each barcode (e.g., group, disease, timepoint); can be left as NULL if input is a Seurat object

coords

optionally, a data frame containing the spatial coordinates for each barcode in the input gene-by-barcode matrix; can be left as NULL if input is a Seurat object

k

the number of spatial nearest-neighbors to use in the AUC calculation; defaults to 50

label_col

the column of the meta data frame, or the metadata container in the Seurat object, that contains condition labels (e.g., disease, timepoint) for each barcode in the gene-by-barcode expression matrix; defaults to "label"

coord_cols

the names of the columns in the coords data frame, or the metadata container in the Seurat object, that contain the coordinates of each spatial barcode in the gene-by-barcode expression matrix; defaults to c("coord_x", "coord_y")

n_subsamples

the number of times to repeat the cross-validation procedure for each barcode; defaults to 50. Set to 0 to omit subsampling altogether, calculating performance on the entire dataset, but note that this may introduce bias due to cell type or label class imbalance. Note that when setting augur_mode = "permute", values less than 100 will be replaced with a default of 500.

subsample_size

the number of barcodes to randomly sample from among the nearest neighbors in each iteration of the cross-validation procedure; cannot be greater than k

folds

the number of folds of cross-validation to run; defaults to 3

var_quantile

the quantile of highly variable genes to retain using the variable gene filter (select_variance); defaults to 0.5

feature_perc

the proportion of genes that are randomly selected as features for input to the classifier in each subsample using the random gene filter (select_random); defaults to 0.5

n_threads

the number of threads to use for parallelization; defaults to 32.

show_progress

if TRUE, display a progress bar for the analysis with estimated time remaining

augur_mode

one of "default" or "velocity". Setting augur_mode = "velocity" disables feature selection, assuming feature selection has been performed by the RNA velocity procedure to produce the input matrix

classifier

the classifier to use in calculating area under the curve, one of "rf" (random forest) or "lr" (logistic regression); defaults to "rf", which is the recommended setting

rf_params

for classifier == "rf", a list of parameters for the random forest models, containing the following items (see rand_forest from the parsnip package):

"mtry"

the number of features randomly sampled at each split in the random forest classifier; defaults to 2

"trees"

the number of trees in the random forest classifier; defaults to 100

"min_n"

the minimum number of observations to split a node in the random forest classifier; defaults to NULL

"importance"

the method of calculating feature importances to use; defaults to "accuracy"; can also specify "gini"

lr_params

for classifier == "lr", a list of parameters for the logistic regression models, containing the following items (see logistic_reg from the parsnip package):

"mixture"

the proportion of L1 regularization in the model; defaults to 1

"penalty"

the total amount of regularization in the model; defaults to "auto", which uses cv.glmnet to set the penalty

Details

If a Seurat object is provided as input, Magellan will use the default assay (i.e., whatever GetAssayData returns) as input. To use a different assay, provide the expression matrix and metadata as input separately, using the input and meta arguments. Additionally, Magellan will assume the coordinates of the spatial barcodes can be found in input@images$slice1@coordinates. To override this, specify the count matrix, metadata, and coordinates separately.

Value

a list of class "Magellan", containing the following items:

  1. parameters: the parameters provided to this function as input

  2. results: the area under the curve for each barcode, in each fold, in each subsample, in the comparison of interest, as well as a series of other classification metrics

  3. AUC: a summary of the mean AUC for each barcode (for continuous experimental conditions, this is replaced by a CCC item that records the mean concordance correlation coefficient for each barcode)


neurorestore/Magellan documentation built on April 25, 2022, 5:46 p.m.