Nothing
Package website: release \| dev
Extends the mlr3 package with a DataBackend to transparently work with databases. Two additional backends are currently implemented:
DataBackendDplyr
: Relies internally on the abstraction of
dplyr and
dbplyr. This allows working on a
broad range of DBMS, such as SQLite, MySQL, MariaDB, or PostgreSQL.DataBackendDuckDB
: Connector to
duckdb. This includes
support for Parquet files (see example below).To construct the backends, you have to establish a connection to the
DBMS yourself with the DBI
package. For the serverless SQLite and DuckDB, we provide the converters
as_sqlite_backend()
and as_duckdb_backend()
.
You can install the released version of mlr3db from CRAN with:
install.packages("mlr3db")
And the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("mlr-org/mlr3db")
library("mlr3db")
#> Loading required package: mlr3
# Create a classification task:
task = tsk("spam")
# Convert the task backend from a in-memory backend (DataBackendDataTable)
# to an out-of-memory SQLite backend via DataBackendDplyr.
# A temporary directory is used here to store the database files.
task$backend = as_sqlite_backend(task$backend, path = tempfile())
# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> <ResampleResult> of 3 iterations
#> * Task: spam
#> * Learner: classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations
library("mlr3db")
# Get an example parquet file from the package install directory:
# spam dataset (tsk("spam")) stored as parquet file
file = system.file(file.path("extdata", "spam.parquet"), package = "mlr3db")
# Create a backend on the file
backend = as_duckdb_backend(file)
# Construct classification task on the constructed backend
task = as_task_classif(backend, target = "type")
# Resample a classification tree using a 3-fold CV.
# The requested data will be queried and fetched from the database in the background.
resample(task, lrn("classif.rpart"), rsmp("cv", folds = 3))
#> <ResampleResult> of 3 iterations
#> * Task: backend
#> * Learner: classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.