tune_and_train_rf_model: Perform hyperparameter tuning and training for...
In tkolisnik/Rf2pval: A comprehensive approach to genomic analysis using scikit-learn's Random Forest models and rank-based feature reduction in R

View source: R/modelTrainingTuningFittingTesting.R

tune_and_train_rf_model

R Documentation

Perform hyperparameter tuning and training for RandomForestClassifier

Description

This function uses scikit-learn's python based GridSearchCV to perform hyperparameter tuning and training of a RandomForestClassifier. It allows for customizable parameter grids and includes preprocessing steps of one-hot encoding and scaling. The function is designed to find the best hyperparameters based on accuracy. Please reference the scikit-learn GridSearchCV documentation for the full description of options, however our defaults are comprehensive.

Usage

tune_and_train_rf_model(
  X,
  y,
  cv_folds = 5,
  scoring_method = "roc_auc",
  seed = 4,
  param_grid = NULL,
  n_jobs = 1,
  n_cores = -2
)

Arguments

`X`	The features for the model (data frame or matrix). Usually obtained from the create_feature_matrix function.
`y`	The target variable for the model (vector). Usually obtained from the create_feature_matrix function.
`cv_folds`	The number of splits in StratifiedKFold cross validation, (default: 5)
`scoring_method`	The scoring method to be used. Options are 'accuracy', 'precision', 'recall', 'roc_auc', 'f1'... see scikit-learn GridSearchCV documentation for more info.
`seed`	The random seed for reproducibility (default: 4).
`param_grid`	An optional list of parameters for tuning the model. If NULL, a default set of parameters is used. The list should follow the format expected by GridSearchCV, with parameters requiring integers suffixed with 'L' (e.g., 10L). This is to ensure compatibility when being passed from R to Python. Default param_grid is as follows: param_grid <- list( bootstrap = list(TRUE), class_weight = list(NULL), max_depth = list(5L, 10L, 15L, 20L, NULL), n_estimators = as.integer(seq(10, 100, 10)), max_features = list("sqrt", "log2", 0.1, 0.2), criterion = list("gini"), warm_start = list(FALSE), min_samples_leaf = list(1L, 2L, 5L, 10L, 20L, 50L), min_samples_split = list(2L, 10L, 20L, 50L, 100L, 200L) )
`n_jobs`	An optional number of jobs to specify for parallel processing. Default is 1.
`n_cores`	An optional number of cores to specify for parallel processing. Default is (-2), which is 2 less than the maximum available number of cores.

Value

A list containing the best hyperparameters for the model, cross-validation scores on training set, and the fitted GridSearchCV object.

Examples

library(Rf2pval)

Load conda environment, which ensures the correct version of Python and the necessary python packages can be loaded. See vignette for more details.
use_condaenv("rf2pval-conda-arm64mac", required = TRUE)

Load the demo data
data(demo_rnaseq_data)

Prepare the sample data into a format ingestible by the ML algorithm
processed_training_data <- create_feature_matrix(demo_data_rnaseq_rf$training_data, "training")

Model training (Warning: may take a long time if dataset is large and if param_grid has many options)
tuning_results <- tune_and_train_rf_model(processed_training_data$X_training_mat, processed_training_data$y_training_vector, cv_folds = 5, seed = 123, param_grid = list(max_depth = list(10L, 20L)))
print(tuning_results$best_params)
print(tuning_results$grid_search$best_score_)

tkolisnik/Rf2pval documentation built on Feb. 20, 2024, 5:39 a.m.

tkolisnik/Rf2pval index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tkolisnik/Rf2pval
A comprehensive approach to genomic analysis using scikit-learn's Random Forest models and rank-based feature reduction in R

tune_and_train_rf_model: Perform hyperparameter tuning and training for...
In tkolisnik/Rf2pval: A comprehensive approach to genomic analysis using scikit-learn's Random Forest models and rank-based feature reduction in R

Perform hyperparameter tuning and training for RandomForestClassifier

Description

Usage

Arguments

Value

Examples

Related to tune_and_train_rf_model in tkolisnik/Rf2pval...

R Package Documentation

Browse R Packages

We want your feedback!

tkolisnik/Rf2pval A comprehensive approach to genomic analysis using scikit-learn's Random Forest models and rank-based feature reduction in R

tune_and_train_rf_model: Perform hyperparameter tuning and training for... In tkolisnik/Rf2pval: A comprehensive approach to genomic analysis using scikit-learn's Random Forest models and rank-based feature reduction in R

Perform hyperparameter tuning and training for RandomForestClassifier

Description

Usage

Arguments

Value

Examples

Related to tune_and_train_rf_model in tkolisnik/Rf2pval...

R Package Documentation

Browse R Packages

We want your feedback!

tkolisnik/Rf2pval
A comprehensive approach to genomic analysis using scikit-learn's Random Forest models and rank-based feature reduction in R

tune_and_train_rf_model: Perform hyperparameter tuning and training for...
In tkolisnik/Rf2pval: A comprehensive approach to genomic analysis using scikit-learn's Random Forest models and rank-based feature reduction in R