mrIMLpredicts: Wrapper to generate multi-response predictive models.

View source: R/mrIMLpredicts.R

mrIMLpredictsR Documentation

Wrapper to generate multi-response predictive models.

Description

Wrapper to generate multi-response predictive models.

Usage

mrIMLpredicts(
  X,
  X1 = NULL,
  Y,
  Model,
  balance_data = "no",
  mode = "regression",
  dummy = FALSE,
  prop = 0.5,
  morans = F,
  tune_grid_size = 10,
  k = 10,
  racing = T,
  seed = sample.int(1e+08, 1)
)

Arguments

X

A dataframe represents predictor or feature data.

X1

A dataframe extra predictor set used in each model. For the MrIML Joint species distribution model (JSDM) this is just a copy of the response data.

Y

A dataframe is response variable data (species, OTUs, SNPs etc).

Model

1 A list can be any model from the tidy model package. See examples.

balance_data

A character 'up', 'down' or 'no'.

mode

character'classification' or 'regression' i.e., is the generative model a regression or classification?

dummy

A logical 'TRUE or FALSE'.

morans

logical 'TRUE or FALSE'. If 'TRUE' global Morans I is calculated for each response

tune_grid_size

A numeric sets the grid size for hyperparameter tuning. Larger grid sizes increase computational time. Ignored if racing=T.

k

A numeric sets the number of folds in the 10-fold cross-validation. 10 is the default.

racing

logical 'TRUE or FALSE'. If 'TRUE' MrIML performs the grid search using the 'racing' ANOVA method. See https://finetune.tidymodels.org/reference/tune_race_anova.html

seed

A numeric as these models have a stochastic component, a seed is set to make to make the analysis reproducible. Defaults between 100 million and 1.

Details

This function produces yhats that used in all subsequent functions. This function fits separate classification/regression models for each response variable in a data set. Rows in X (features) have the same id (host/site/population) as Y. Class imbalance can be a real issue for classification analyses. Class imbalance can be addressed for each response variable using 'up' (upsampling using ROSE bootstrapping), 'down' (downsampling) or 'no' (no balancing of classes).

Examples

all_cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(all_cores)
registerDoParallel(cl)

model1 <- 
rand_forest(trees = 100, mode = "classification") %>% #this should cope with multinomial data alreadf
  set_engine("ranger", importance = c("impurity","impurity_corrected")) %>% #model is not tuned to increase computational speed
 set_mode("classification")
 
yhats <- mrIMLpredicts(X= enviro_variables,Y=response_data, model1=model1, balance_data='no', model='classification',  
tune_grid_size=5, k=10, seed = sample.int(1e8, 1)))

nfj1380/mrIML documentation built on May 17, 2024, 7:41 a.m.