Home

/

GitHub

/

neurorestore/Augur

/

calculate_auc: Prioritize cell types involved in a biological process

calculate_auc: Prioritize cell types involved in a biological process
In neurorestore/Augur: Cell type prioritization in high-dimensional single-cell data

View source: R/calculate_auc.R

calculate_auc

R Documentation

Prioritize cell types involved in a biological process

Description

Prioritize cell types involved in a complex biological process by training a machine-learning model to predict sample labels (e.g., disease vs. control, treated vs. untreated, or time post-stimulus), and evaluate the performance of the model in cross-validation.

Usage

calculate_auc(
  input,
  meta = NULL,
  label_col = "label",
  cell_type_col = "cell_type",
  n_subsamples = 50,
  subsample_size = 20,
  folds = 3,
  min_cells = NULL,
  var_quantile = 0.5,
  feature_perc = 0.5,
  n_threads = 4,
  show_progress = T,
  augur_mode = c("default", "velocity", "permute"),
  classifier = c("rf", "lr"),
  rf_params = list(trees = 100, mtry = 2, min_n = NULL, importance = "accuracy"),
  lr_params = list(mixture = 1, penalty = "auto")
)

Arguments

`input`	a matrix, data frame, or `Seurat`, `monocle`, or `SingleCellExperiment` object containing gene expression values (genes in rows, cells in columns) and, optionally, metadata about each cell
`meta`	a data frame containing metadata about the `input` gene-by-cell matrix, at minimum containing the cell type for each cell and the labels (e.g., group, disease, timepoint); can be left as `NULL` if `input` is a `Seurat` or `monocle` object
`label_col`	the column of the `meta` data frame, or the metadata container in the `Seurat` or `monocle` object, that contains condition labels (e.g., disease, timepoint) for each cell in the gene-by-cell expression matrix; defaults to `label`
`cell_type_col`	the column of the `meta` data frame, or the metadata container in the `Seurat`/`monocle` object, that contains cell type labels for each cell in the gene-by-cell expression matrix; defaults to `cell_type`
`n_subsamples`	the number of random subsamples of fixed size to draw from the complete dataset, for each cell type; defaults to `50`. Set to `0` to omit subsampling altogether, calculating performance on the entire dataset, but note that this may introduce bias due to cell type or label class imbalance. Note that when setting `augur_mode = "permute"`, values less than `100` will be replaced with a default of `500`.
`subsample_size`	the number of cells per type to subsample randomly from each experimental condition, if `n_subsamples` is greater than 1; defaults to `20`
`folds`	the number of folds of cross-validation to run; defaults to `3`. Be careful changing this parameter without also changing `subsample_size`
`min_cells`	the minimum number of cells for a particular cell type in each condition in order to retain that type for analysis; defaults to `subsample_size`
`var_quantile`	the quantile of highly variable genes to retain for each cell type using the variable gene filter (select_variance); defaults to `0.5`
`feature_perc`	the proportion of genes that are randomly selected as features for input to the classifier in each subsample using the random gene filter (select_random); defaults to `0.5`
`n_threads`	the number of threads to use for parallelization; defaults to `4`.
`show_progress`	if `TRUE`, display a progress bar for the analysis with estimated time remaining
`augur_mode`	one of `"default"`, `"velocity"`, or `"permute"`. Setting `augur_mode = "velocity"` disables feature selection, assuming feature selection has been performed by the RNA velocity procedure to produce the input matrix, while setting `augur_mode = "permute"` will generate a null distribution of AUCs for each cell type by permuting the labels
`classifier`	the classifier to use in calculating area under the curve, one of `"rf"` (random forest) or `"lr"` (logistic regression); defaults to `"rf"`, which is the recommended setting
`rf_params`	for `classifier` == `"rf"`, a list of parameters for the random forest models, containing the following items (see rand_forest from the `parsnip` package): "mtry" the number of features randomly sampled at each split in the random forest classifier; defaults to `2` "trees" the number of trees in the random forest classifier; defaults to `100` "min_n" the minimum number of observations to split a node in the random forest classifier; defaults to `NULL` "importance" the method of calculating feature importances to use; defaults to `"accuracy"`; can also specify `"gini"`
`lr_params`	for `classifier` == `"lr"`, a list of parameters for the logistic regression models, containing the following items (see logistic_reg from the `parsnip` package): "mixture" the proportion of L1 regularization in the model; defaults to `1` "penalty" the total amount of regularization in the model; defaults to `"auto"`, which uses cv.glmnet to set the penalty

Details

If a Seurat object is provided as input, Augur will use the default assay (i.e., whatever GetAssayData returns) as input. To use a different assay, provide the expression matrix and metadata as input separately, using the input and meta arguments.

Value

a list of class "Augur", containing the following items:

X: the numeric matrix (or data frame or sparse matrix, depending on the input) containing gene expression values for each cell in the dataset
y: the vector of experimental condition labels being predicted
cell_types: the vector of cell type labels
parameters: the parameters provided to this function as input
results: the area under the curve for each cell type, in each fold, in each subsample, in the comparison of interest, as well as a series of other classification metrics
feature_importance: the importance of each feature for calculating the AUC, above. For random forest classifiers, this is the mean decrease in accuracy or Gini index. For logistic regression classifiers, this is the standardized regression coefficients, computed using the Agresti method
AUC: a summary of the mean AUC for each cell type (for continuous experimental conditions, this is replaced by a CCC item that records the mean concordance correlation coefficient for each cell type)

neurorestore/Augur documentation built on Oct. 28, 2024, 9:41 a.m.

neurorestore/Augur index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

neurorestore/Augur
Cell type prioritization in high-dimensional single-cell data

calculate_auc: Prioritize cell types involved in a biological process
In neurorestore/Augur: Cell type prioritization in high-dimensional single-cell data

Prioritize cell types involved in a biological process

Description

Usage

Arguments

Details

Value

Related to calculate_auc in neurorestore/Augur...

R Package Documentation

Browse R Packages

We want your feedback!

neurorestore/Augur Cell type prioritization in high-dimensional single-cell data

calculate_auc: Prioritize cell types involved in a biological process In neurorestore/Augur: Cell type prioritization in high-dimensional single-cell data

Prioritize cell types involved in a biological process

Description

Usage

Arguments

Details

Value

Related to calculate_auc in neurorestore/Augur...

R Package Documentation

Browse R Packages

We want your feedback!

neurorestore/Augur
Cell type prioritization in high-dimensional single-cell data

calculate_auc: Prioritize cell types involved in a biological process
In neurorestore/Augur: Cell type prioritization in high-dimensional single-cell data