mrIMLpredicts: Generates a multi-response predictive model
In nfj1380/mrIML: Multi-Response (Multivariate) Interpretable Machine Learning

mrIMLpredicts

R Documentation

Generates a multi-response predictive model

Description

This function fits separate classification/regression models, specified in the tidymodels framework, for each response variable in a data set. This is the core function of mrIML.

Usage

mrIMLpredicts(
  X,
  X1 = NULL,
  Y,
  Model,
  balance_data = "no",
  dummy = FALSE,
  prop = 0.7,
  tune_grid_size = 10,
  k = 10,
  racing = TRUE
)

Arguments

`Y`, `X`, `X1`	Data frames containing the response, predictor, and the joint response variables (i.e. the responses that are also to be used as predictors if fitting GN model) respectively. If `X1` is not provided then a standard multi-response model will be fit to the data (e.g. the response models are independant of one another conditional on the predictors supplied in X). See Details section below.
`Model`	Any model from the tidymodels package. See Examples.
`balance_data`	A character string: "up": up-samples the data to equal class sizes. "down": down-samples the data to equal class sizes. "no": leaves the data as is. "no" is the default value.
`dummy`	A logical value indicating if `recipes::step_dummy()` should be included in the data recipe.
`prop`	A numeric value between 0 and 1. Defines the training-testing data proportion to be used, which defaults to `prop = 0.7`.
`tune_grid_size`	A numeric value that sets the grid size for hyperparameter tuning. Larger grid sizes increase computational time. Ignored if `racing = TRUE`.
`k`	A numeric value. Sets the number of folds in the cross-validation. 10-fold CV is the default.
`racing`	A logical value. If `TRUE`, `mrIML` performs the grid search using the `finetune::tune_race_anova()` method; otherwise, `tune::tune_grid()` is used. `racing = TRUE` is now the default method of tuning.

Details

mrIMLpredicts fits the supplied tidy model to each response variable in the data frame Y. If only X (a data frame of predictors) is supplied, then independent models are fit, i.e., the other response variables are not used as predictors. If X1 (a data frame of all or select response variables) is supplied, then those response variables are also used as predictors in the response models. For example, supplying X1 means that a co-occurrence model is fit.

If balance_data = "up", then themis::step_rose() is used to upsample the dataset; however, we generally recommend using balance_data = "no" in most cases.

Value

A list object with three slots:

⁠$Model⁠: The tidymodels object that was fit.
⁠$Data⁠: A list of the raw data.
⁠$Fits⁠: A list of the fitted models for each response variable.

Examples

library(tidymodels)

data <- MRFcov::Bird.parasites

# Define the response variables of interest
Y <- data %>%
  select(-scale.prop.zos) %>%
  select(order(everything()))

# Define the predictors
X <- data %>%
  select(scale.prop.zos)

# Specify a random forest tidy model
model_rf <- rand_forest(
  trees = 50, # 50 trees are set for brevity. Aim to start with 1000
  mode = "classification",
  mtry = tune(),
  min_n = tune()
) %>%
  set_engine("randomForest")

# Fitting independent multi-response model -----------------------------------
MR_model_rf <- mrIMLpredicts(
  X = X,
  Y = Y,
  Model = model_rf,
  prop = 0.7,
  k = 2,
  racing = FALSE
)

# Fitting a graphical network model -----------------------------------------
# Define the dependent response variables (all in this case)
X1 <- Y

GN_model <- mrIMLpredicts(
  X = X,
  Y = Y,
  X1 = X1,
  Model = model_rf,
  prop = 0.7,
  k = 2,
  racing = FALSE
)

nfj1380/mrIML documentation built on June 2, 2025, 1:03 a.m.