knitr::opts_chunk$set( warning = FALSE, collapse = TRUE, comment = "#>", fig.width = 7.5, fig.height = 5 )
This page was entirely written by our AMR for R Assistant, a ChatGPT manually-trained model able to answer any question about the AMR package.
Antimicrobial resistance (AMR) is a global health crisis, and understanding resistance patterns is crucial for managing effective treatments. The AMR
R package provides robust tools for analysing AMR data, including convenient antibiotic selector functions like aminoglycosides()
and betalactams()
. In this post, we will explore how to use the tidymodels
framework to predict resistance patterns in the example_isolates
dataset.
By leveraging the power of tidymodels
and the AMR
package, we’ll build a reproducible machine learning workflow to predict the Gramstain of the microorganism to two important antibiotic classes: aminoglycosides and beta-lactams.
Our goal is to build a predictive model using the tidymodels
framework to determine the Gramstain of the microorganism based on microbial data. We will:
aminoglycosides()
and betalactams()
.tidymodels
workflow to preprocess, train, and evaluate the model.We begin by loading the required libraries and preparing the example_isolates
dataset from the AMR
package.
# Load required libraries library(AMR) # For AMR data analysis library(tidymodels) # For machine learning workflows, and data manipulation (dplyr, tidyr, ...) # Select relevant columns for prediction data <- example_isolates %>% # select AB results dynamically select(mo, aminoglycosides(), betalactams()) %>% # replace NAs with NI (not-interpretable) mutate(across(where(is.sir), ~replace_na(.x, "NI")), # make factors of SIR columns across(where(is.sir), as.integer), # get Gramstain of microorganisms mo = as.factor(mo_gramstain(mo))) %>% # drop NAs - the ones without a Gramstain (fungi, etc.) drop_na()
Explanation:
aminoglycosides()
and betalactams()
dynamically select columns for antibiotics in these classes.drop_na()
ensures the model receives complete cases for training.We now define the tidymodels
workflow, which consists of three steps: preprocessing, model specification, and fitting.
We create a recipe to preprocess the data for modelling.
# Define the recipe for data preprocessing resistance_recipe <- recipe(mo ~ ., data = data) %>% step_corr(c(aminoglycosides(), betalactams()), threshold = 0.9) resistance_recipe
For a recipe that includes at least one preprocessing operation, like we have with step_corr()
, the necessary parameters can be estimated from a training set using prep()
:
prep(resistance_recipe)
Explanation:
recipe(mo ~ ., data = data)
will take the mo
column as outcome and all other columns as predictors.step_corr()
removes predictors (i.e., antibiotic columns) that have a higher correlation than 90%.Notice how the recipe contains just the antibiotic selector functions - no need to define the columns specifically. In the preparation (retrieved with prep()
) we can see that the columns or variables r paste0("'", suppressMessages(prep(resistance_recipe))$steps[[1]]$removals, "'", collapse = " and ")
were removed as they correlate too much with existing, other variables.
We define a logistic regression model since resistance prediction is a binary classification task.
# Specify a logistic regression model logistic_model <- logistic_reg() %>% set_engine("glm") # Use the Generalized Linear Model engine logistic_model
Explanation:
logistic_reg()
sets up a logistic regression model.set_engine("glm")
specifies the use of R's built-in GLM engine.We bundle the recipe and model together into a workflow
, which organizes the entire modeling process.
# Combine the recipe and model into a workflow resistance_workflow <- workflow() %>% add_recipe(resistance_recipe) %>% # Add the preprocessing recipe add_model(logistic_model) # Add the logistic regression model resistance_workflow
To train the model, we split the data into training and testing sets. Then, we fit the workflow on the training set and evaluate its performance.
# Split data into training and testing sets set.seed(123) # For reproducibility data_split <- initial_split(data, prop = 0.8) # 80% training, 20% testing training_data <- training(data_split) # Training set testing_data <- testing(data_split) # Testing set # Fit the workflow to the training data fitted_workflow <- resistance_workflow %>% fit(training_data) # Train the model
Explanation:
initial_split()
splits the data into training and testing sets.fit()
trains the workflow on the training set.Notice how in fit()
, the antibiotic selector functions are internally called again. For training, these functions are called since they are stored in the recipe.
Next, we evaluate the model on the testing data.
# Make predictions on the testing set predictions <- fitted_workflow %>% predict(testing_data) # Generate predictions probabilities <- fitted_workflow %>% predict(testing_data, type = "prob") # Generate probabilities predictions <- predictions %>% bind_cols(probabilities) %>% bind_cols(testing_data) # Combine with true labels predictions # Evaluate model performance metrics <- predictions %>% metrics(truth = mo, estimate = .pred_class) # Calculate performance metrics metrics
Explanation:
predict()
generates predictions on the testing set.metrics()
computes evaluation metrics like accuracy and kappa.It appears we can predict the Gram based on AMR results with a r round(metrics$.estimate[1], 3) * 100
% accuracy based on AMR results of aminoglycosides and beta-lactam antibiotics. The ROC curve looks like this:
predictions %>% roc_curve(mo, `.pred_Gram-negative`) %>% autoplot()
In this post, we demonstrated how to build a machine learning pipeline with the tidymodels
framework and the AMR
package. By combining selector functions like aminoglycosides()
and betalactams()
with tidymodels
, we efficiently prepared data, trained a model, and evaluated its performance.
This workflow is extensible to other antibiotic classes and resistance patterns, empowering users to analyse AMR data systematically and reproducibly.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.