View source: R/rf_domain_score.R
rf_domain_score | R Documentation |
This function fits a Random Forest model to the provided data and computes a domain applicability score based on PCA distances.
rf_domain_score(
featured_col,
train_data,
rf_hyperparameters,
test_data,
threshold_value
)
featured_col |
A character string specifying the name of the response variable to predict. |
train_data |
A data frame containing predictor variables and the response variable for training the model. |
rf_hyperparameters |
A list of hyperparameters for the Random Forest model, including:
|
test_data |
A data frame for making predictions. |
threshold_value |
A numeric threshold value used for computing domain applicability scores. |
Random Forest creates a large number of decision trees, each independent of the others. The final prediction combines the predictions from all individual trees. This function uses the ranger
engine for fitting regression models.
A data frame containing the computed domain applicability scores for each observation in the test dataset.
set.seed(123)
library(dplyr)
featured_col <- "cd_2022"
train_data <- viral %>%
dplyr::select(cd_2022, vl_2022)
test_data <- sero
rf_hyperparameters <- list(mtry = 2, min_n = 5, trees = 500)
threshold_value <- 0.99
rf_domain_score(featured_col, train_data, rf_hyperparameters, test_data, threshold_value)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.