View source: R/evaluate_model_ligand_prediction.R
get_multi_ligand_rf_importances | R Documentation |
get_multi_ligand_rf_importances
A random forest is trained to construct one model based on the target gene predictions of all ligands of interest (ligands are considered as features) in order to predict the observed response in a particular dataset. Variable importance scores that indicate for each ligand the importance for response prediction, are extracted. It can be assumed that ligands with higher variable importance scores are more likely to be a true active ligand.
get_multi_ligand_rf_importances(setting,ligand_target_matrix, ligands_position = "cols", ntrees = 1000, mtry = 2, continuous = TRUE, known = TRUE, filter_genes = FALSE)
setting |
A list containing the following elements: .$name: name of the setting; .$from: name(s) of the ligand(s) of which the predictve performance need to be assessed; .$response: the observed target response: indicate for a gene whether it was a target or not in the setting of interest. $ligand: NULL or the name of the ligand(s) that are known to be active in the setting of interest. |
ligand_target_matrix |
A matrix of ligand-target probabilty scores (recommended) or discrete target assignments (not-recommended). |
ligands_position |
Indicate whether the ligands in the ligand-target matrix are in the rows ("rows") or columns ("cols"). Default: "cols" |
ntrees |
Indicate the number of trees used in the random forest algorithm. The more trees, the longer model training takes, but the more robust the extraced importance scores will be. Default: 1000. Recommended for robustness to have till 10000 trees. |
mtry |
n**(1/mtry) features of the n features will be sampled at each split during the training of the random forest algorithm. Default: 2 (square root). |
continuous |
Indicate whether during training of the model, model training and evaluation should be done on class probabilities or discrete class labels. For huge class imbalance, we recommend setting this value to TRUE. Default: TRUE. |
known |
Indicate whether the true active ligand for a particular dataset is known or not. Default: TRUE. The true ligand will be extracted from the $ligand slot of the setting. |
filter_genes |
Indicate whether 50 per cent of the genes that are the least variable in ligand-target scores should be removed in order to reduce the training of the model. Default: FALSE. |
A data.frame with for each ligand - data set combination, feature importance scores indicating how important the query ligand is for the prediction of the response in the particular dataset, when prediction is done via a trained classification model with all possible ligands as input. In addition to the importance score(s), the name of the particular setting ($setting), the name of the query ligand($test_ligand), the name of the true active ligand (if known: $ligand).
## Not run:
settings = lapply(expression_settings_validation[1:5],convert_expression_settings_evaluation)
settings_ligand_pred = convert_settings_ligand_prediction(settings, all_ligands = unlist(extract_ligands_from_settings(settings,combination = FALSE)), validation = TRUE, single = FALSE)
weighted_networks = construct_weighted_networks(lr_network, sig_network, gr_network, source_weights_df)
ligands = extract_ligands_from_settings(settings_ligand_pred,combination = FALSE)
ligand_target_matrix = construct_ligand_target_matrix(weighted_networks, ligands)
ligand_importances_rf = dplyr::bind_rows(lapply(settings_ligand_pred, get_multi_ligand_rf_importances,ligand_target_matrix, ntrees = 100, mtry = 2))
print(head(ligand_importances_rf))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.