custom_preprocessing()
, and preprocessing_feature_selection()
to BORUTA
, as it is the most effective one,preprocessing()
inside train()
to knn
, as it is the most effective one,.Rbuildignore
, DESCRIPTION
, and NAMESPACE
.cran-comments.md
file for future CRAN submission.ranger
, and xgboost
only, due to the limitations of other packages. It results in an addition of the parallel
parameter in train()
, random_search()
, and train_models_bayesopt()
functions.bayes_info
parameter in train()
, and train_models_bayesopt()
functions. The user can now track the quality of found hyperparameters.select_models()
function, which lets the user shrink the output of train()
function to the selected models only. It is useful when the user wants to focus on the best models only, and limit the size of the output, while maintaining the ability to generate reports, and other functionalities.train()
function, regarding the predictions of all three sets.select_models()
function.check_correlation
parameter for both train()
and check_Data()
functions, which lets the user decide whether to check the correlation between the features or not. In some corner cases, with highly dimensional data, it can be time-consuming, and not necessary..Rbuildignore
and .gitignore
.DESCRIPTION
, and NAMESPACE
.choose_best_models.R
, forester_palette.R
, format_model_details.R
check_data()
:check_y_balance()
sub-function for multiclass classification.custom_preprocessing()
:multiclass classification
task, with all extensional functionalities included (report, explainability, plots, etc), which led to the abundance of changes in majority of functions.train_test_balance()
.misc/old_tests_03_02_2023
folder.verbose_cat()
, and guess_type()
, from their own files to check_data()
.choose_best_models()
function, and implemented its logic directly in train()
.create_ranked_list()
, and format_models_details()
functions.plot_classification()
function, associated with the error cannot xtfrm data frames occurred.prepare_data()
, and preprocessing_removal()
functions.misc/manual_tests
with supplementary, manual tests of the package, focusing on report generation.test-choose_best_model
test, and fixed two other tests.patchwork
package, and moved arules
from Imports to Suggests,report()
function:plot_classifcation
file, which overrides plot()
function for the binary classification object returned by train()
. The function lets us create plots for binary classification tasks, such as metrics comparison line plot, ROC curve, confusion matrix, and train vs test plot.plot_regression
file, which overrides plot()
function for the regression object returned by train()
. The function lets us create plots for regression tasks, such as residuals box plot, observed vs prediction plot, and train vs test plot.report.Rmd
file, and created two separate files for different tasks called report_binary.Rmd
, and report_regression.Rmd
.train()
function: added different classes for the outcomes, depending on the task type.plot_metrics()
, changed color palette to colors_discrete_forester
.check_data()
, added a small fix for the report.SurvMetrics
, randomForestSRC
, and survival
packages.train()
function:time
, and status
describing the survival analysis task,time
, and status
to the functions output,check data()
function:time
, and status
describing the survival analysis task,basic_info()
, check_missing()
, check_cor()
, and check_y_balance()
, so they also work for the survival analysis.preprocessing()
function, added the binarization of survival task target (status).prepare_data()
function, added a method for the survival analysis task.train_models()
function, added a method for training a survival analysis model from randomForestSRC
package.random_search()
function, added a method for tuning a survival analysis model.train_models_bayesopt()
function, added a method for tuning a survival analysis model.predict_models_all()
, predict_models()
, and predict_new
functions, added a method for predicting a survival analysis model.score_models()
, added a method for evaluation of survival analysis models with Brier Score
, and Concordance Index (CIN)
.guess_type()
, added method of detecting the survival analysis task.tests
of the package, to cover the changes made in the package.VIM
package, and in the Suggests sivs
, parallel
, rmcfs
, and varrank
packages.check_data()
function:custom_preprocessing()
function, which is more advanced and customizable approach for the preprocessing pipeline. It executes other new functions implementing three major pillars of preprocessing. The functions are preprocessing_removal()
, preprocessing_imputation()
, preprocessing_feature_selection()
:preprocessing_removal()
- This function includes 6 modules for the removal of unwanted features / observations. We can remove duplicate columns, the ID-like columns, static columns (with specified staticity threshold), sparse columns (with specified sparsity threshold), and highly correlated ones (with specified high correlation threshold). Additionally we can remove the observations that are too sparse (sparsity threshold), and have missing target value. One can turn on and off each module by setting proper logical values.preprocessing_imputation()
- Imputes missing values according to one of four prepared methods:median-other
- The numeric features are imputed with median value, whereas the categorical ones with the 'other' string,median-frequency
- The numeric features are imputed with median value, whereas the categorical ones with the most frequent value,knn
- All features are imputed with KNN algorithm,mice
- All features are imputed with MICE algorithm.preprocessing_feature_selection()
- Conducts a feature selection process with one out of four proposed methods:VI
- The variable importance method based on random forest,MCFS
- The Monte Carlo Feature Selection,MI
- The Varrank method based on mutual information scores,BORUTA
- The BORUTA algorithm - short time.custom_preprocessing()
, preprocessing_removal()
, preprocessing_imputation()
, and preprocessing_feature_selection()
functions.train()
function:advanced_preprocessing
parameter, as custom_preprocessing()
is more advanced version of it,custom_preprocessing
parameter, which takes the output of a custom_preprocessing()
function,deleted_rows
value to the output,columns
to deleted_columns
.preprocessing()
function:advanced
parameter, as custom_preprocessing()
is more advanced version of it,columns
to rm_columns
.save()
function:save_forest()
,name
and path
into file
as it is standard approachprepare_data()
function added encoding for the ctree
model so it can use columns with more than 30 levels.random_search()
function added parameter verbose
. When set to TRUE the function provides information about the training.train()
function:split_seed
parameter which enables the user to set the seed for the train-test split method,train_inds
, test_inds
, and valid_inds
vectors to the train output, which enable the user to recover the information which observation went to the train, test, and validation sets.train_models_bayesopt()
function:check_data()
function:choose_best_model()
,create_ranked_list()
,format_models_details()
,draw_scatter_plot()
,predict_models()
,predict_models_all()
,save()
,train_models()
,train_test_balance()
.explain()
function:draw_radar_plot()
function:prepare_data()
function:preprocessing()
function:random_search()
function:score_models()
function:train()
function:train_models_bayesopt()
function:tests
so they match the current version.README
.report()
function:draw_radar_plot()
function:check_data()
function:train()
function:train()
function:check_data()
function:preprocessing()
function:binarize_target()
sub function.report()
function:predict_models_all()
.predcit_models_all()
.train_models()
function no longer returns NULL objects if engine is not selected.train_models_bayesopt()
function no longer returns NULL objects if engine is not selected.format_models_details()
function fixed a bug with the method not working for a classification task for an xgboost model.DESCRIPTION
catboost, ggradar and tinytex dependencies moved from Suggests to Imports and added crayon to Import.README.md
, which solves installation issues on macOS.train()
function.preprocessing()
.train()
function output.predict_new()
function.save()
the file name has a right month.train()
function returns table with metrics on validation subset.score()
function returns tables with additional columns: engine
and tuning
.bayes_iter = 0
causes that Bayestian_opt()
is not ran anymore.random_iter
to random_evals
.verbose = FALSE
disables the check_data
entirely.train()
function by adding parameters to:check_data()
added detection of id columns and reformatted the outputs.create_ranked_list()
able to work with missing values.explain()
function redesigned to work on single and multiple models.plot_metrics()
improved all plots and changed them into ggplot
visualizations, added a feature importance
plot.predcit_models_all()
into predict_models_all()
and enabled prediction on non-fixed, larger amount of models.predict_new()
function for new observations.preprocessing()
function by optional, advanced preprocessing consisting of deleting correlated values, deleting id columns, selecting only the most important features via the BORUTA algorithm.save()
function which saves the output of the train()
function.score_models()
function, so that the user can add their own metric function for scoring models.train()
function by:prepare_data()
).verbose_cat()
function for optional messages.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.