In RJ333/phyloseq2ML: Connects Phyloseq To Ranger And Keras

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Keras adjustments for regression

If you want to perform regression with keras, there is not much to change compared to the classification workflow.

Changes in data preparation

First of all, your response variable needs to be numeric, obviously. So you don't need to categorize it and you can skip the categorize_response_variable() function or use it instead with ML_method = "regression". That's practically it.

Changes in the analysis

Your call to keras including the data frame setup could look like this:

# "ready_keras_regression" is the prepared list object in keras formatting
parameter_keras_regression <- extract_parameters(ready_keras_regression)
hyper_keras_regression_training <- expand.grid(
  ML_object = names(ready_keras_regression),
  Epochs = 5, 
  Batch_size = 2, 
  k_fold = 4, 
  current_k_fold = 1:4,
  Early_callback = "mae",  # Mean absolute error as callback metric
  Layer1_units = 20,
  Layer2_units = 8,
  Dropout_layer1 = 0.2,
  Dropout_layer2 = 0.0,
  Dense_activation_function = "relu",
  Optimizer_function = "rmsprop",
  Loss_function = "mse",  # Mean Squared error as loss function
  Metric = "mae", # Mean absolute error as training metric
  Cycle = 1:3,
  step = "training",
  Delay = 2)

You may have noticed here that we don't need an activation function for the output layer. We also don't need to specify the number of output nodes, which are usually determined by the number of classes in classification and for regression they are set to 1.

You are already familiar with this part

master_keras_regression_training <- merge(parameter_keras_regression, hyper_keras_regression_training, by = "ML_object")
master_keras_regression_training <- master_keras_regression_training[order(
  master_keras_regression_training$ML_object, 
  master_keras_regression_training$Cycle, 
  master_keras_regression_training$current_k_fold), ]
rownames(master_keras_regression_training) <- NULL

And this would be the function call using keras_regression and subsequent results data.frame generation:

master_keras_regression_training$results <- purrr::pmap(cbind(master_keras_regression_training, .row = rownames(master_keras_regression_training)), 
  keras_regression, the_list = ready_keras_regression, master_grid = master_keras_regression_training)
keras_df_regression_training <-  as.data.frame(tidyr::unnest(master_keras_regression_training, results))

Except for the change in for k_fold and current_k_fold to 1 and step = "prediction" you can use the same arguments for prediction than you used for training.

Keras adjustments for multiclass classification

If you want to perform regression with keras, there is not much to change compared to the classification workflow.

Changes in data preparation

You need to specifiy more than two intervals and corresponding classes in the categorize_response_variable() function. You should also have a look how many samples per class you have, as this quickly becomes a problem. If you only have 20 samples left for one class in the prediction step, each sample accounts for 5 % accuracy.

You will receive different output metrics than in classification, of course. All column names in the data frame should be clearly identifiable with metrics such as MSE, RMSE, MAE and R Squared.

Changes in the analysis

Your call to keras including the data frame setup could look like this:

# "ready_keras_multi" is the prepared list object in keras formatting
parameter_keras_multi <- extract_parameters(ready_keras_multi)

hyper_keras_multi <- expand.grid(
  ML_object = names(ready_keras_multi),
  Epochs = 5, 
  Batch_size = 2, 
  k_fold = 4, 
  current_k_fold = 1:4,
  Early_callback = val_loss",
  Layer1_units = 20,
  Layer2_units = 8,
  Dropout_layer1 = 0.2,
  Dropout_layer2 = 0.0,
  Dense_activation_function = "relu",
  Output_activation_function = "softmax", # sigmoid for binary
  Optimizer_function = "rmsprop",
  Loss_function = "categorical_crossentropy", # binary_crossentropy for binary
  Metric = "accuracy",
  Cycle = 1:3,
  step = "training",
  Classification = "multiclass",
  Delay = 2)

You may have noticed here that we use categorical instead of binary crossentropy as loss function. We also use softmax as output activation function. The number of output nodes are automatically determined by the number of classes. The row Classification = "multiclass" has no meaning for the function, but allows us later in a large combined results data frame to subset for multiclass analyses.

You are already familiar with this part:

master_keras_multi <- merge(parameter_keras_multi, hyper_keras_multi, by = "ML_object")
# order by current_k_fold 
master_keras_multi <- master_keras_multi[order(
  master_keras_multi$ML_object, 
  master_keras_multi$Cycle, 
  master_keras_multi$current_k_fold), ]
rownames(master_keras_multi) <- NULL

And this is the function call with results processing afterwards.

master_keras_multi$results <- purrr::pmap(cbind(master_keras_multi, .row = rownames(master_keras_multi)), 
  keras_classification, the_list = ready_keras_multi, master_grid = master_keras_multi)
keras_df_multi <- as.data.frame(tidyr::unnest(master_keras_multi, results))

You will get one row per run and class in the results data frame

Prediction:

Please note that (as for binary classification) you have to change the value to the Early_callback parameter from "val_loss" to "accuracy" in prediction. Also, k_fold and current_k_fold have to be 1

RJ333/phyloseq2ML documentation built on June 2, 2020, 9:25 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com