evaluate_importances_ligand_prediction: Evaluation of ligand activity prediction based on ligand...

View source: R/evaluate_model_ligand_prediction.R

evaluate_importances_ligand_predictionR Documentation

Evaluation of ligand activity prediction based on ligand importance scores.

Description

evaluate_importances_ligand_prediction Evaluate how well a trained model of ligand importance scores is able to predict the true activity state of a ligand. For this it is assumed, that ligand importance measures for truely active ligands will be higher than for non-active ligands. A classificiation algorithm chosen by the user is trained to construct one model based on the ligand importance scores of all ligands of interest (ligands importance scores are considered as features). Several classification evaluation metrics for the prediction are calculated and variable importance scores can be extracted to rank the different importance measures in order of importance for ligand activity state prediction.

Usage

evaluate_importances_ligand_prediction(importances, normalization, algorithm, var_imps = TRUE, cv = TRUE, cv_number = 4, cv_repeats = 2, parallel = FALSE, n_cores = 4,ignore_errors = FALSE)

Arguments

importances

A data frame containing at least folowing variables: $setting, $test_ligand, $ligand and one or more feature importance scores. $test_ligand denotes the name of a possibly active ligand, $ligand the name of the truely active ligand.

normalization

Way of normalization of the importance measures: "mean" (classifcal z-score) or "median" (modified z-score)

algorithm

The name of the classification algorithm to be applied. Should be supported by the caret package. Examples of algorithms we recommend: with embedded feature selection: "rf","glm","fda","glmnet","sdwd","gam","glmboost"; without: "lda","naive_bayes","pls"(because bug in current version of pls package), "pcaNNet". Please notice that not all these algorithms work when the features (i.e. ligand vectors) are categorical (i.e. discrete class assignments).

var_imps

Indicate whether in addition to classification evaluation performances, variable importances should be calculated. Default: TRUE.

cv

Indicate whether model training and hyperparameter optimization should be done via cross-validation. Default: TRUE. FALSE might be useful for applications only requiring variable importance, or when final model is not expected to be extremely overfit.

cv_number

The number of folds for the cross-validation scheme: Default: 4; only relevant when cv == TRUE.

cv_repeats

The number of repeats during cross-validation. Default: 2; only relevant when cv == TRUE.

parallel

Indiciate whether the model training will occur parallelized. Default: FALSE. TRUE only possible for non-windows OS.

n_cores

The number of cores used for parallelized model training via cross-validation. Default: 4. Only relevant on non-windows OS.

ignore_errors

Indiciate whether errors during model training by caret should be ignored such that another model training try will be initiated until model is trained without raising errors. Default: FALSE.

Value

A list with the following elements. $performances: data frame containing classification evaluation measure for classification on the test folds during training via cross-validation; $performances_training: data frame containing classification evaluation measures for classification of the final model (discrete class assignments) on the complete data set (performance can be severly optimistic due to overfitting!); $performance_training_continuous: data frame containing classification evaluation measures for classification of the final model (class probability scores) on the complete data set (performance can be severly optimistic due to overfitting!) $var_imps: data frame containing the variable importances of the different ligands (embbed importance score for some classification algorithms, otherwise just the auroc); $prediction_response_df: data frame containing for each ligand-setting combination the ligand importance scores for the individual importance scores, the complete model of importance scores and the ligand activity as well (TRUE or FALSE); $model: the caret model object that can be used on new importance scores to predict the ligand activity state.

Examples

## Not run: 
settings = lapply(expression_settings_validation[1:5],convert_expression_settings_evaluation)
settings_ligand_pred = convert_settings_ligand_prediction(settings, all_ligands = unlist(extract_ligands_from_settings(settings,combination = FALSE)), validation = TRUE, single = TRUE)

weighted_networks = construct_weighted_networks(lr_network, sig_network, gr_network, source_weights_df)
ligands = extract_ligands_from_settings(settings_ligand_pred,combination = FALSE)
ligand_target_matrix = construct_ligand_target_matrix(weighted_networks, ligands)
ligand_importances = dplyr::bind_rows(lapply(settings_ligand_pred,get_single_ligand_importances,ligand_target_matrix))
evaluation = evaluate_importances_ligand_prediction(ligand_importances,"median","lda")
print(head(evaluation))

## End(Not run)

saeyslab/nichenetr documentation built on March 26, 2024, 9:22 a.m.