View source: R/stability-based-parameter-assessment.R
automatic_stability_assessment | R Documentation |
Evaluates the stability of different graph clustering methods in the clustering pipeline. The method will iterate through different values of the resolution parameter and compare, using the EC Consistency score, the partitions obtained at different seeds.
automatic_stability_assessment(
expression_matrix,
n_repetitions,
n_neigh_sequence,
resolution_sequence,
features_sets,
steps,
seed_sequence = NULL,
graph_reduction_embedding = "PCA",
include_umap_nn_assessment = FALSE,
n_top_configs = 3,
ranking_criterion = "iqr",
overall_summary = "median",
ecs_threshold = 1,
matrix_processing = function(dt_mtx, actual_npcs = 30, ...) {
actual_npcs <-
min(actual_npcs, ncol(dt_mtx)%/%2)
RhpcBLASctl::blas_set_num_threads(foreach::getDoParWorkers())
embedding <-
stats::prcomp(x = dt_mtx, rank. = actual_npcs)$x
RhpcBLASctl::blas_set_num_threads(1)
rownames(embedding) <- rownames(dt_mtx)
colnames(embedding) <- paste0("PC_", seq_len(ncol(embedding)))
return(embedding)
},
umap_arguments = list(),
prune_value = -1,
algorithm_dim_reduction = 1,
algorithm_graph_construct = 1,
algorithms_clustering_assessment = 1:3,
clustering_arguments = list(),
verbose = TRUE,
temp_file = NULL,
save_temp = TRUE
)
expression_matrix |
An expression matrix having the features on the rows and the cells on the columns. |
n_repetitions |
The number of repetitions of applying the pipeline with
different seeds; ignored if seed_sequence is provided by the user. Defaults to |
n_neigh_sequence |
A sequence of the number of nearest neighbours. |
resolution_sequence |
A sequence of resolution values. The resolution parameter controls the coarseness of the clustering. The higher the resolution, the more clusters will be obtained. The resolution parameter is used in the community detection algorithms. |
features_sets |
A list of the feature sets. A feature set is a list of genes from the expression matrix that will be used in the dimensionality reduction. |
steps |
A list with the same names as |
seed_sequence |
A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100. |
graph_reduction_embedding |
The type of dimensionality reduction used for
the graph construction. The options are "PCA" and "UMAP". Defaults to |
include_umap_nn_assessment |
A boolean value indicating if the UMAP embeddings
will be used for the nearest neighbours assessment. Defaults to |
n_top_configs |
The number of top configurations that will be used for the
downstream analysis in the dimensionality reduction step. Defaults to |
ranking_criterion |
The criterion used for ranking the configurations from
the dimensionality reduction step. The options are "iqr", "median", "max", "top_qt",
"top_qt_max", "iqr_median", "iqr_median_coeff" and "mean". Defaults to |
overall_summary |
A function used to summarize the stability of the configurations
from the dimensionality reduction step across the different resolution values.
The options are "median", "max", "top_qt", "top_qt_max", "iqr", "iqr_median",
"iqr_median_coeff" and "mean". Defaults to |
ecs_threshold |
The ECS threshold used for merging similar clusterings. |
matrix_processing |
A function that will be used to process the data matrix
by using a dimensionality reduction technique. The function should have
one parameter, the data matrix, and should return an embedding describing the
reduced space. By default, the function will use the precise PCA method with
|
umap_arguments |
A list containing the arguments that will be passed to the
UMAP function. Refer to the |
prune_value |
Argument indicating whether to prune the SNN graph. If the value is 0, the graph won't be pruned. If the value is between 0 and 1, the edges with weight under the pruning value will be removed. If the value is -1, the highest pruning value will be calculated automatically and used. |
algorithm_dim_reduction |
An index indicating the community detection algorithm that will be used in the Dimensionality reduction step. |
algorithm_graph_construct |
An index indicating the community detection algorithm that will be used in the Graph construction step. |
algorithms_clustering_assessment |
An index indicating which community
detection algorithm will be used for the clustering step: Louvain (1),
Louvain refined (2), SLM (3) or Leiden (4). More details can be found in
the Seurat's |
clustering_arguments |
A list containing the arguments that will be passed to the
community detection algorithm, such as the number of iterations and the number of starts.
Refer to the Seurat's |
verbose |
Boolean value used for displaying the progress of the assessment. |
temp_file |
The path to the file where the object will be saved. |
save_temp |
A boolean value indicating if the object will be saved to a file. |
A list having two fields:
all - a list that contains, for each clustering method and each resolution value, the EC consistency between the partitions obtained by changing the seed
filtered - similar to all
, but for each configuration, we determine the
number of clusters that appears the most and use only the partitions with this
size
## Not run:
set.seed(2024)
# create an already-transposed artificial expression matrix
expr_matrix <- matrix(
c(runif(20 * 10), runif(30 * 10, min = 3, max = 4)),
nrow = 10, byrow = FALSE
)
colnames(expr_matrix) <- as.character(seq_len(ncol(expr_matrix)))
rownames(expr_matrix) <- paste("feature", seq_len(nrow(expr_matrix)))
autom_object <- automatic_stability_assessment(
expression_matrix = expr_matrix,
n_repetitions = 3,
n_neigh_sequence = c(5),
resolution_sequence = c(0.1, 0.5),
features_sets = list(
"set1" = rownames(expr_matrix)
),
steps = list(
"set1" = c(5, 7)
),
umap_arguments = list(
# the following parameters have been modified
# from the default values to ensure that
# the function will run under 5 seconds
n_neighbors = 3,
approx_pow = TRUE,
n_epochs = 0,
init = "random",
min_dist = 0.3
),
n_top_configs = 1,
algorithms_clustering_assessment = 1,
save_temp = FALSE,
verbose = FALSE
)
# the object can be further used to plot the assessment results
plot_feature_overall_stability_boxplot(autom_object$feature_stability)
plot_n_neigh_ecs(autom_object$set1$"5"$nn_stability)
plot_k_n_partitions(autom_object$set1$"5"$clustering_stability)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.