mlr3resampling: Resampling Algorithms for 'mlr3' Framework

A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. If each data point belongs to a group (such as geographic region, year, etc), then how do we know if it is possible to train on one group, and predict accurately on another group? Cross-validation can be used to determine the extent to which this is possible, by first assigning fold IDs from 1 to K to all data (possibly using stratification, usually by group and label). Then we loop over test sets (group/fold combinations), train sets (same group, other groups, all groups), and compute test/prediction accuracy for each combination. Comparing test/prediction accuracy between same and other, we can determine the extent to which it is possible (perfect if same/other have similar test accuracy for each group; other is usually somewhat less accurate than same; other can be just as bad as featureless baseline when the groups have different patterns). For more information, <https://tdhock.github.io/blog/2023/R-gen-new-subsets/> describes the method in depth. How many train samples are required to get accurate predictions on a test set? Cross-validation can be used to answer this question, with variable size train sets.

Getting started

Package details

AuthorToby Hocking [aut, cre] (<https://orcid.org/0000-0002-3146-0865>), Michel Lang [ctb] (<https://orcid.org/0000-0001-9754-0393>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Bernd Bischl [ctb] (<https://orcid.org/0000-0001-6002-6980>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Jakob Richter [ctb] (<https://orcid.org/0000-0003-4481-5554>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Patrick Schratz [ctb] (<https://orcid.org/0000-0003-0748-6624>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Giuseppe Casalicchio [ctb] (<https://orcid.org/0000-0001-5324-5966>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Stefan Coors [ctb] (<https://orcid.org/0000-0002-7465-2146>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Quay Au [ctb] (<https://orcid.org/0000-0002-5252-8902>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Martin Binder [ctb], Florian Pfisterer [ctb] (<https://orcid.org/0000-0001-8867-762X>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Raphael Sonabend [ctb] (<https://orcid.org/0000-0001-9225-4654>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Lennart Schneider [ctb] (<https://orcid.org/0000-0003-4152-5308>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Marc Becker [ctb] (<https://orcid.org/0000-0002-8115-0400>, Author of mlr3 when Resampling/ResamplingCV was copied/modified), Sebastian Fischer [ctb] (<https://orcid.org/0000-0002-9609-3197>, Author of mlr3 when Resampling/ResamplingCV was copied/modified)
MaintainerToby Hocking <toby.hocking@r-project.org>
LicenseGPL-3
Version2024.9.6
URL https://github.com/tdhock/mlr3resampling
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("mlr3resampling")

Try the mlr3resampling package in your browser

Any scripts or data that you put into this service are public.

mlr3resampling documentation built on Sept. 12, 2024, 6:23 a.m.