navigate_space: Perform spatial prioritization of the response to a...
In neurorestore/Magellan: Spatial prioritization in spatial transcriptomics data

View source: R/navigate_space.R

navigate_space

R Documentation

Perform spatial prioritization of the response to a biological perturbation

Description

Prioritize spatial locations involved in a complex biological process by training a machine-learning model to predict sample labels (e.g., disease vs. control, treated vs. untreated, or time post-stimulus), and evaluate the performance of the model in cross-validation.

Usage

navigate_space(
  input,
  meta = NULL,
  coords = NULL,
  k = 50,
  label_col = "label",
  coord_cols = c("coord_x", "coord_y"),
  n_subsamples = 50,
  subsample_size = 20,
  folds = 3,
  var_quantile = 0.5,
  feature_perc = 0.5,
  n_threads = 32,
  show_progress = T,
  augur_mode = c("default", "velocity"),
  classifier = c("rf", "lr"),
  rf_params = list(trees = 100, mtry = 2, min_n = NULL, importance = "accuracy"),
  lr_params = list(mixture = 1, penalty = "auto")
)

Arguments

`input`	a matrix, data frame, or `Seurat` object containing gene expression values (genes in rows, cells in columns) and, optionally, metadata about each spatial barcode
`meta`	optionally, a data frame containing metadata about the `input` gene-by-barcode matrix, at minimum containing the label associated with each barcode (e.g., group, disease, timepoint); can be left as `NULL` if `input` is a `Seurat` object
`coords`	optionally, a data frame containing the spatial coordinates for each barcode in the `input` gene-by-barcode matrix; can be left as `NULL` if `input` is a `Seurat` object
`k`	the number of spatial nearest-neighbors to use in the AUC calculation; defaults to `50`
`label_col`	the column of the `meta` data frame, or the metadata container in the `Seurat` object, that contains condition labels (e.g., disease, timepoint) for each barcode in the gene-by-barcode expression matrix; defaults to `"label"`
`coord_cols`	the names of the columns in the `coords` data frame, or the metadata container in the `Seurat` object, that contain the coordinates of each spatial barcode in the gene-by-barcode expression matrix; defaults to `c("coord_x", "coord_y")`
`n_subsamples`	the number of times to repeat the cross-validation procedure for each barcode; defaults to `50`. Set to `0` to omit subsampling altogether, calculating performance on the entire dataset, but note that this may introduce bias due to cell type or label class imbalance. Note that when setting `augur_mode = "permute"`, values less than `100` will be replaced with a default of `500`.
`subsample_size`	the number of barcodes to randomly sample from among the nearest neighbors in each iteration of the cross-validation procedure; cannot be greater than `k`
`folds`	the number of folds of cross-validation to run; defaults to `3`
`var_quantile`	the quantile of highly variable genes to retain using the variable gene filter (select_variance); defaults to `0.5`
`feature_perc`	the proportion of genes that are randomly selected as features for input to the classifier in each subsample using the random gene filter (select_random); defaults to `0.5`
`n_threads`	the number of threads to use for parallelization; defaults to `32`.
`show_progress`	if `TRUE`, display a progress bar for the analysis with estimated time remaining
`augur_mode`	one of `"default"` or `"velocity"`. Setting `augur_mode = "velocity"` disables feature selection, assuming feature selection has been performed by the RNA velocity procedure to produce the input matrix
`classifier`	the classifier to use in calculating area under the curve, one of `"rf"` (random forest) or `"lr"` (logistic regression); defaults to `"rf"`, which is the recommended setting
`rf_params`	for `classifier` == `"rf"`, a list of parameters for the random forest models, containing the following items (see rand_forest from the `parsnip` package): "mtry" the number of features randomly sampled at each split in the random forest classifier; defaults to `2` "trees" the number of trees in the random forest classifier; defaults to `100` "min_n" the minimum number of observations to split a node in the random forest classifier; defaults to `NULL` "importance" the method of calculating feature importances to use; defaults to `"accuracy"`; can also specify `"gini"`
`lr_params`	for `classifier` == `"lr"`, a list of parameters for the logistic regression models, containing the following items (see logistic_reg from the `parsnip` package): "mixture" the proportion of L1 regularization in the model; defaults to `1` "penalty" the total amount of regularization in the model; defaults to `"auto"`, which uses cv.glmnet to set the penalty

Details

If a Seurat object is provided as input, Magellan will use the default assay (i.e., whatever GetAssayData returns) as input. To use a different assay, provide the expression matrix and metadata as input separately, using the input and meta arguments. Additionally, Magellan will assume the coordinates of the spatial barcodes can be found in input@images$slice1@coordinates. To override this, specify the count matrix, metadata, and coordinates separately.

Value

a list of class "Magellan", containing the following items:

parameters: the parameters provided to this function as input
results: the area under the curve for each barcode, in each fold, in each subsample, in the comparison of interest, as well as a series of other classification metrics
AUC: a summary of the mean AUC for each barcode (for continuous experimental conditions, this is replaced by a CCC item that records the mean concordance correlation coefficient for each barcode)