auto_simon_ml | R Documentation |
This function automates the process of building machine learning models using the caret package. It supports both binary and multi-class classification and allows users to specify a list of machine learning algorithms to be trained on the dataset. The function splits the dataset into training and testing sets, applies preprocessing steps, and trains models using cross-validation. It computes relevant performance metrics such as confusion matrix, AUROC (for binary classification), and prAUC (for binary classification).
auto_simon_ml(dataset_ml, settings)
dataset_ml |
A data frame containing the dataset for training. All columns except the outcome column should contain the features. |
settings |
A list containing the following parameters:
|
The function performs preprocessing (e.g., centering, scaling, and imputation of missing values) on the dataset based on the provided settings. It splits the data into training and testing sets using the specified partition, trains models using cross-validation, and computes performance metrics.
For binary classification problems, the function calculates AUROC and prAUC. For multi-class classification, it calculates macro-averaged AUROC, though prAUC is not used.
The function returns a list of trained models along with their performance metrics, including confusion matrix, variable importance, and post-resample metrics.
A list where each element corresponds to a trained model for one of the algorithms specified in
settings$selectedPackages
. Each element contains:
info
: General information about the model, including resampling indices, problem type,
and outcome mapping.
training
: The trained model object and variable importance.
predictions
: Predictions on the test set, including probabilities, confusion matrix,
post-resample statistics, AUROC (for binary classification), and prAUC (for binary classification).
## Not run:
dataset <- read.csv("fc_wo_noise.csv", header = TRUE, row.names = 1)
# Generate a file header for the dataset to use in downstream analysis
file_header <- generate_file_header(dataset)
settings <- list(
fileHeader = file_header,
# Columns selected for analysis
selectedColumns = c("ExampleColumn1", "ExampleColumn2"),
clusterType = "Louvain",
removeNA = TRUE,
preProcessDataset = c("scale", "center", "medianImpute", "corr", "zv", "nzv"),
target_clusters_range = c(3,4),
resolution_increments = c(0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5),
min_modularities = c(0.4, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9),
pickBestClusterMethod = "Modularity",
seed = 1337
)
result <- immunaut(dataset, settings)
dataset_ml <- result$dataset$original
dataset_ml$pandora_cluster <- tsne_clust[[i]]$info.norm$pandora_cluster
dataset_ml <- dplyr::rename(dataset_ml, immunaut = pandora_cluster)
dataset_ml <- dataset_ml[, c("immunaut", setdiff(names(dataset_ml), "immunaut"))]
settings_ml <- list(
excludedColumns = c("ExampleColumn0"),
preProcessDataset = c("scale", "center", "medianImpute", "corr", "zv", "nzv"),
selectedPartitionSplit = split, # Use the current partition split
selectedPackages = c("rf", "RRF", "RRFglobal", "rpart2", "c5.0", "sparseLDA",
"gcvEarth", "cforest", "gaussPRPoly", "monmlp", "slda", "spls"),
trainingTimeout = 180 # Timeout 3 minutes
)
ml_results <- auto_simon_ml(dataset_ml, settings_ml)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.