get_multi_ligand_rf_importances: Get ligand importances from a multi-ligand trained random...

View source: R/evaluate_model_ligand_prediction.R

get_multi_ligand_rf_importancesR Documentation

Get ligand importances from a multi-ligand trained random forest model.

Description

get_multi_ligand_rf_importances A random forest is trained to construct one model based on the target gene predictions of all ligands of interest (ligands are considered as features) in order to predict the observed response in a particular dataset. Variable importance scores that indicate for each ligand the importance for response prediction, are extracted. It can be assumed that ligands with higher variable importance scores are more likely to be a true active ligand.

Usage

get_multi_ligand_rf_importances(setting,ligand_target_matrix, ligands_position = "cols", ntrees = 1000, mtry = 2, continuous = TRUE, known = TRUE, filter_genes = FALSE)

Arguments

setting

A list containing the following elements: .$name: name of the setting; .$from: name(s) of the ligand(s) of which the predictve performance need to be assessed; .$response: the observed target response: indicate for a gene whether it was a target or not in the setting of interest. $ligand: NULL or the name of the ligand(s) that are known to be active in the setting of interest.

ligand_target_matrix

A matrix of ligand-target probabilty scores (recommended) or discrete target assignments (not-recommended).

ligands_position

Indicate whether the ligands in the ligand-target matrix are in the rows ("rows") or columns ("cols"). Default: "cols"

ntrees

Indicate the number of trees used in the random forest algorithm. The more trees, the longer model training takes, but the more robust the extraced importance scores will be. Default: 1000. Recommended for robustness to have till 10000 trees.

mtry

n**(1/mtry) features of the n features will be sampled at each split during the training of the random forest algorithm. Default: 2 (square root).

continuous

Indicate whether during training of the model, model training and evaluation should be done on class probabilities or discrete class labels. For huge class imbalance, we recommend setting this value to TRUE. Default: TRUE.

known

Indicate whether the true active ligand for a particular dataset is known or not. Default: TRUE. The true ligand will be extracted from the $ligand slot of the setting.

filter_genes

Indicate whether 50 per cent of the genes that are the least variable in ligand-target scores should be removed in order to reduce the training of the model. Default: FALSE.

Value

A data.frame with for each ligand - data set combination, feature importance scores indicating how important the query ligand is for the prediction of the response in the particular dataset, when prediction is done via a trained classification model with all possible ligands as input. In addition to the importance score(s), the name of the particular setting ($setting), the name of the query ligand($test_ligand), the name of the true active ligand (if known: $ligand).

Examples

## Not run: 
settings = lapply(expression_settings_validation[1:5],convert_expression_settings_evaluation)
settings_ligand_pred = convert_settings_ligand_prediction(settings, all_ligands = unlist(extract_ligands_from_settings(settings,combination = FALSE)), validation = TRUE, single = FALSE)

weighted_networks = construct_weighted_networks(lr_network, sig_network, gr_network, source_weights_df)
ligands = extract_ligands_from_settings(settings_ligand_pred,combination = FALSE)
ligand_target_matrix = construct_ligand_target_matrix(weighted_networks, ligands)
ligand_importances_rf = dplyr::bind_rows(lapply(settings_ligand_pred, get_multi_ligand_rf_importances,ligand_target_matrix, ntrees = 100, mtry = 2))
print(head(ligand_importances_rf))

## End(Not run)


saeyslab/nichenetr documentation built on March 26, 2024, 9:22 a.m.