support_vector_machine: Fit a Support Vector Machine (SVM)

View source: R/support_vector_machine.R

support_vector_machineR Documentation

Fit a Support Vector Machine (SVM)

Description

support_vector_machine() is a wrapper of the e1071::svm() function to fit a support vector machine with the ability to tune the hyperparameters with grid search or bayesian optimization in a simple way. You can fit univariate models for numeric and categorical response variables.

All the parameters marked as (tunable) accept a vector of values with wich the grid is generated for grid search tuning or a list with the min and max values for bayesian optimization tuning. The returned object contains a data.frame with the hyperparameters combinations evaluated. In the end the best combination of hyperparameters is used to fit the final model, which is also returned and can be used to make new predictions.

Usage

support_vector_machine(
  x,
  y,
  kernel = "linear",
  degree = 3,
  gamma = 1/NCOL(x),
  coef0 = 0,
  cost = 1,
  tune_type = "Grid_search",
  tune_cv_type = "K_fold",
  tune_folds_number = 5,
  tune_testing_proportion = 0.2,
  tune_folds = NULL,
  tune_loss_function = NULL,
  tune_grid_proportion = 1,
  tune_bayes_samples_number = 10,
  tune_bayes_iterations_number = 10,
  scale = TRUE,
  class_weights = NULL,
  cache_size = 40,
  tolerance = 0.001,
  epsilon = 0.1,
  shrinking = TRUE,
  fitted = TRUE,
  validate_params = TRUE,
  seed = NULL,
  verbose = TRUE
)

Arguments

x

(matrix) The predictor (independet) variable(s). It must be a numeric matrix. You can use to_matrix() function to convert your data to a matrix.

y

(data.frame | vector | matrix) The response (dependent) variable. If it is a data.frame or a matrix it must have only one column. If y is character, logical or factor, then a classification machine is fitted, otherwise a regression machine.

kernel

(character(1)) (case not sensitive) The kernel used in the support vector machine. The options are "linear", "polynomial", "radial" and "sigmoid" (for more information, see Details section below). You can use gamma, coef0 and degree parameters to adjust the kernel. "linear" by default.

degree

(numeric) (tunable) Parameter needed for "polynomial" kernel. 3 by default.

gamma

(numeric) (tunable) Parameter needed for all kernels except "linear". 1 / NCOL(x) by default.

coef0

(numeric) (tunable) Parameter needed for "polynomial" and "sigmoid" kernels. 0 by default.

cost

(numeric) (tunable) Cost of constraints violation. It is the 'C'-constant of the regularization term in the Lagrange formulation. 1 by default.

tune_type

(character(1)) (case not sensitive) The type of tuning to perform. The options are "Grid_search" and "Bayesian_optimization". "Grid_search" by default.

tune_cv_type

(character(1)) (case not sensitive) The type of cross validation to tune the model. The options are "K_fold", "K_fold_strata" (only for univariate categorical response variables) and "Random". "K_fold" by defaul.

tune_folds_number

(numeric(1)) The number of folds to tune each hyperparameter combination (k in k-fold cross validation). 5 by default.

tune_testing_proportion

(numeric(1)) A number between (0, 1) to specify the proportion of records to use as validation set when tune_cv_type is "Random". 0.2 by default.

tune_folds

(list) Custom folds for tuning. It must be a list of list's where each entry will represent a fold. Each inner list has to contain the fields "training" and "testing" with numeric vectors of indices of those entries to be used as training and testing in each fold. Note that when this parameter is set, tune_cv_type, tune_folds_number and tune_testing_proportion are ignored. NULL by default.

tune_loss_function

(character(1)) (case not sensitive) The loss function to use in tuning. The options are "mse", "maape", "mae", "nrmse", "rmse" or "pearson" when y is a numeric response variable, "accuracy" or "kappa_coeff" when y is a categorical response variable (including binary) and "f1_score", "roc_auc" or "pr_auc" when y is a binary response variable. NULL by default which uses "mse" for numeric variables and "accuracy" for categorical variables.

tune_grid_proportion

(numeric(1)) Only when tune_type is "Grid_search", a number between (0, 1] to specify the proportion of hyperparameters combinations to sample from the grid and evaluate in tuning (useful when the grid is big). 1 by default (full grid).

tune_bayes_samples_number

(numeric(1)) Only when tune_type is "Bayesian_optimization", the number of initial random hyperparameters combinations to evalute before the Bayesian optimization process. 10 by default.

tune_bayes_iterations_number

(numeric(1)) Only when tune_type is "Bayesian_optimization", the number of optimization iterations to evaluate after the initial random samples specified in tune_bayes_samples_number. 10 by default.

scale

(logical) A logical vector indicating the variables in x to be scaled. If scale is of length 1, the value is recycled as many times as needed. TRUE by default.

class_weights

(numeric | character) For categorical responses only. A named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named. Specifying "inverse" (case not sensitive) will choose the weights inversely proportional to the class distribution. NULL by default.

cache_size

(numeric(1)) Cache memory in MB. 40 by default.

tolerance

(numeric(1)) Tolerance of termination criterion. 0.001 by default.

epsilon

(numeric(1)) Epsilon in the insensitive-loss function. 0.1 by default.

shrinking

(logical(1)) Do you want to use the shrinking-heuristics? TRUE by default.

fitted

(logical(1)) Should the fitted values be computed and included in the model? TRUE by default.

validate_params

(logical(1)) Should the parameters be validated? It is not recommended to set this parameter to FALSE because if something fails a non meaningful error is going to be thrown. TRUE by default.

seed

(numeric(1)) A value to be used as internal seed for reproducible results. NULL by default.

verbose

(logical(1)) Should the progress information be printed? TRUE by default.

Details

You have to consider that before tuning all columns without variance (where all the records has the same value) are removed. Such columns positions are returned in the removed_x_cols field of the returned object.

All records with missing values (NA), either in x or in y will be removed. The positions of the removed records are returned in the removed_rows field of the returned object.

Tuning

The general tuning algorithm works as follows:

Tuning algorithm

For grid search tuning, the hyperparameters grid is generated (step one in the algorithm) with the cartesian product of all the provided values (all the posible combinations) in all tunable parameters. If only one value of each tunable parameter is provided no tuning is done. tune_grid_proportion allows you to specify the proportion of all combinations you want to sample from the full grid and tune them, by default all combinations are evaluated.

For bayesian optimization tuning, step one in the algorithm works a little different. At start, tune_bayes_samples_number different hyperparameters combinations are generated and evaluated, then tune_bayes_iterations_number new hyperparameters combinations are generated and evaluated iteratively based on the bayesian optimization algorithm, but this process is equivalent to that described in the general tuninig algorithm. Note that only the hyperparameters for which the list of min and max values were provided are tuned and their values fall in the specified boundaries.

In univariate models with a numeric response variable, Mean Squared Error (mse()) is used by default as loss function. In univariate models with a categorical response variable, either binary or with more than two categories, accuracy (accuracy()) is used by default. You can change the default loss function used in tuning with the tune_loss_function parameter.

kernel

The 4 different kernel transformations are described in the following mathematical expressions:

  • linear:

    K(X, Y) = X^T %*% Y

  • polynomial:

    K(X, Y) = (gamma * X^T %*% Y + coef0)^degree

  • radial:

    K(X, Y) = eXp(-gamma * |X - Y|^2)

  • sigmoid:

    K(X, Y) = tanh(gamma * X^T %*% Y + coef0)

When you provide several values of degree, gamma and/or coef0 for tuning with kernels that does not support them they are not taken in account. For example, for "linear" kernel all values in these parameters are ignored (no grid is generated with them) since "linear" kernel does not use any of them.

Value

An object of class "SupportVectorMachineModel" that inherits from classes "Model" and "R6" with the fields:

  • fitted_model: An object of class e1071::svm() with the model.

  • x: The final matrix used to fit the model.

  • y: The final vector used to fit the model.

  • hyperparams_grid: A data.frame with all the computed combinations of hyperparameters and with one more column called "loss" with the value of the loss function for each combination. The data is ordered with the best combinations at start, sometimes with the lowest values first and other times with the greatest values first, depending the loss function.

  • best_hyperparams: A list with the combination of hyperparameters with the best loss value (the first row in hyperparams_grid).

  • execution_time: A difftime object with the total time taken to tune and fit the model.

  • removed_rows: A numeric vector with the records' indices (in the provided position) that were deleted and not taken in account in tunning nor training.

  • removed_x_cols: A numeric vector with the columns' indices (in the provided positions) that were deleted and not taken in account in tunning nor training.

  • ...: Some other parameters for internal use.

See Also

predict.Model()

Other models: bayesian_model(), deep_learning(), generalized_boosted_machine(), generalized_linear_model(), mixed_model(), partial_least_squares(), random_forest()

Examples

# Use all default hyperparameters (no tuning) ------------------------------
x <- to_matrix(iris[, -5])
y <- iris$Species
model <- support_vector_machine(x, y)

# Predict using the fitted model
predictions <- predict(model, x)
# Obtain the predicted values
predictions$predicted
# Obtain the predicted probabilities
predictions$probabilities

# Tune with grid search -----------------------------------------------------
x <- to_matrix(iris[, -1])
y <- iris$Sepal.Length
model <- support_vector_machine(
  x,
  y,
  kernel = "polynomial",
  degree = c(1, 2),
  gamma = c(0.1, 0.3),
  coef0 = c(0, 1),
  tune_type = "grid_search",
  tune_cv_type = "k_fold",
  tune_folds_number = 5
)

# Obtain the whole grid with the loss values
model$hyperparams_grid
# Obtain the hyperparameters combination with the best loss value
model$best_hyperparams

# Predict using the fitted model
predictions <- predict(model, x)
# Obtain the predicted values
predictions$predicted

# Tune with Bayesian optimization -------------------------------------------
x <- to_matrix(iris[, -1])
y <- iris$Sepal.Length
model <- support_vector_machine(
  x,
  y,
  kernel = "sigmoid",
  gamma = list(min = 0.01, max = 0.9),
  coef0 = list(min = 0, max = 5),
  tune_type = "bayesian_optimization",
  tune_bayes_samples_number = 5,
  tune_bayes_iterations_number = 5,
  tune_cv_type = "random",
  tune_folds_number = 4
)

# Obtain the whole grid with the loss values
model$hyperparams_grid
# Obtain the hyperparameters combination with the best loss value
model$best_hyperparams

# Predict using the fitted model
predictions <- predict(model, x)
# Obtain the predicted values
predictions$predicted

# Obtain the execution time taken to tune and fit the model
model$execution_time

# Genomic selection ------------------------------------------------------------
data(Wheat)

# Data preparation of G
Line <- model.matrix(~ 0 + Line, data = Wheat$Pheno)
# Compute cholesky
Geno <- cholesky(Wheat$Geno)
# G matrix
X <- Line %*% Geno
y <- Wheat$Pheno$Y

# Set seed for reproducible results
set.seed(2022)
folds <- cv_random(
  records_number = nrow(X),
  folds_number = 5,
  testing_proportion = 0.3
)

Predictions <- data.frame()
Hyperparams <- data.frame()

# Model training and predictions
for (i in seq_along(folds)) {
  cat("*** Fold:", i, "***\n")
  fold <- folds[[i]]

  # Identify the training and testing sets
  X_training <- X[fold$training, ]
  X_testing <- X[fold$testing, ]
  y_training <- y[fold$training]
  y_testing <- y[fold$testing]

  # Model training
  model <- support_vector_machine(
    x = X_training,
    y = y_training,

    # Specify the hyperparameters for tunning
    coef0 = list(min = 0.01, max = 2),
    gamma = list(min = 0.0001, max = 1),
    kernel = "polynomial",

    tune_type = "bayesian_optimization",
    tune_bayes_iterations_number = 5,
    tune_bayes_samples_number = 3
  )

  # Prediction of testing set
  predictions <- predict(model, X_testing)

  # Predictions for the i-th fold
  FoldPredictions <- data.frame(
    Fold = i,
    Line = Wheat$Pheno$Line[fold$testing],
    Env = Wheat$Pheno$Env[fold$testing],
    Observed = y_testing,
    Predicted = predictions$predicted
  )
  Predictions <- rbind(Predictions, FoldPredictions)

  # Hyperparams
  HyperparamsFold <- model$hyperparams_grid %>%
    mutate(Fold = i)
  Hyperparams <- rbind(Hyperparams, HyperparamsFold)

  # Best hyperparams of the model
  cat("*** Optimal hyperparameters: ***\n")
  print(model$best_hyperparams)
}

head(Predictions)
# Compute the summary of all predictions
summaries <- gs_summaries(Predictions)

# Summaries by Line
head(summaries$line)

# Summaries by Environment
summaries$env

# Summaries by Fold
summaries$fold

# First rows of Hyperparams
head(Hyperparams)
# Last rows of Hyperparams
tail(Hyperparams)

brandon-mosqueda/SKM documentation built on Feb. 8, 2025, 5:24 p.m.