View source: R/calculate_auc.R
calculate_auc | R Documentation |
Prioritize cell types involved in a complex biological process by training a machine-learning model to predict sample labels (e.g., disease vs. control, treated vs. untreated, or time post-stimulus), and evaluate the performance of the model in cross-validation.
calculate_auc(
input,
meta = NULL,
label_col = "label",
cell_type_col = "cell_type",
n_subsamples = 50,
subsample_size = 20,
folds = 3,
min_cells = NULL,
var_quantile = 0.5,
feature_perc = 0.5,
n_threads = 4,
show_progress = T,
augur_mode = c("default", "velocity", "permute"),
classifier = c("rf", "lr"),
rf_params = list(trees = 100, mtry = 2, min_n = NULL, importance = "accuracy"),
lr_params = list(mixture = 1, penalty = "auto")
)
input |
a matrix, data frame, or |
meta |
a data frame containing metadata about the |
label_col |
the column of the |
cell_type_col |
the column of the |
n_subsamples |
the number of random subsamples of fixed size to
draw from the complete dataset, for each cell type; defaults to |
subsample_size |
the number of cells per type to subsample randomly from
each experimental condition, if |
folds |
the number of folds of cross-validation to run; defaults to
|
min_cells |
the minimum number of cells for a particular cell type in
each condition in order to retain that type for analysis;
defaults to |
var_quantile |
the quantile of highly variable genes to retain for
each cell type using the variable gene filter (select_variance);
defaults to |
feature_perc |
the proportion of genes that are randomly selected as
features for input to the classifier in each subsample using the
random gene filter (select_random); defaults to |
n_threads |
the number of threads to use for parallelization;
defaults to |
show_progress |
if |
augur_mode |
one of |
classifier |
the classifier to use in calculating area under the curve,
one of |
rf_params |
for
|
lr_params |
for
|
If a Seurat
object is provided as input, Augur will use the default
assay (i.e., whatever GetAssayData returns) as input. To
use a different assay, provide the expression matrix and metadata as input
separately, using the input
and meta
arguments.
a list of class "Augur"
, containing the following items:
X
: the numeric matrix (or data frame or sparse matrix,
depending on the input) containing gene expression values for each cell
in the dataset
y
: the vector of experimental condition labels being predicted
cell_types
: the vector of cell type labels
parameters
: the parameters provided to this function as input
results
: the area under the curve for each cell type, in each
fold, in each subsample, in the comparison of interest, as well as a
series of other classification metrics
feature_importance
: the importance of each feature for
calculating the AUC, above. For random forest classifiers, this is the
mean decrease in accuracy or Gini index. For logistic regression
classifiers, this is the standardized regression coefficients, computed
using the Agresti method
AUC
: a summary of the mean AUC for each cell type (for
continuous experimental conditions, this is replaced by a CCC
item that records the mean concordance correlation coefficient for each
cell type)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.