check_all: Check all params that don't return a value
In mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines

check_all

R Documentation

Check all params that don't return a value

Description

Check all params that don't return a value

Usage

check_all(
  dataset,
  method,
  permute,
  kfold,
  training_frac,
  perf_metric_function,
  perf_metric_name,
  groups,
  group_partitions,
  corr_thresh,
  seed,
  hyperparameters
)

Arguments

`dataset`	Data frame with an outcome variable and other columns as features.
`method`	ML method. Options: `c("glmnet", "rf", "rpart2", "svmRadial", "xgbTree")`. glmnet: linear, logistic, or multiclass regression rf: random forest rpart2: decision tree svmRadial: support vector machine xgbTree: xgboost
`kfold`	Fold number for k-fold cross-validation (default: `5`).
`training_frac`	Fraction of data for training set (default: `0.8`). Rows from the dataset will be randomly selected for the training set, and all remaining rows will be used in the testing set. Alternatively, if you provide a vector of integers, these will be used as the row indices for the training set. All remaining rows will be used in the testing set.
`perf_metric_function`	Function to calculate the performance metric to be used for cross-validation and test performance. Some functions are provided by caret (see `caret::defaultSummary()`). Defaults: binary classification = `twoClassSummary`, multi-class classification = `multiClassSummary`, regression = `defaultSummary`.
`perf_metric_name`	The column name from the output of the function provided to perf_metric_function that is to be used as the performance metric. Defaults: binary classification = `"ROC"`, multi-class classification = `"logLoss"`, regression = `"RMSE"`.
`groups`	Vector of groups to keep together when splitting the data into train and test sets. If the number of groups in the training set is larger than `kfold`, the groups will also be kept together for cross-validation. Length matches the number of rows in the dataset (default: `NULL`).
`group_partitions`	Specify how to assign `groups` to the training and testing partitions (default: `NULL`). If `groups` specifies that some samples belong to group `"A"` and some belong to group `"B"`, then setting `group_partitions = list(train = c("A", "B"), test = c("B"))` will result in all samples from group `"A"` being placed in the training set, some samples from `"B"` also in the training set, and the remaining samples from `"B"` in the testing set. The partition sizes will be as close to `training_frac` as possible. If the number of groups in the training set is larger than `kfold`, the groups will also be kept together for cross-validation.
`corr_thresh`	For feature importance, group correlations above or equal to `corr_thresh` (range `0` to `1`; default: `1`).
`seed`	Random seed (default: `NA`). Your results will only be reproducible if you set a seed.
`hyperparameters`	Dataframe of hyperparameters (default `NULL`; sensible defaults will be chosen automatically).