DataBackendDplyr: DataBackend for dplyr/dbplyr

Description Usage Arguments Examples

Description

A mlr3::DataBackend using dplyr::tbl() from packages dplyr/dbplyr. Allows to connect a mlr3::Task to a out-of-memory data base.

Returns an object of class mlr3::DataBackend.

Usage

1
2
3
# Construction
b = DataBackendDplyr$new(data, primary_key)
b = as_data_backend(data, primary_key)

The interface is described in mlr3::DataBackend.

Arguments

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Backend using a in-memory tibble
data = tibble::as.tibble(iris)
data$Sepal.Length[1:30] = NA
data$row_id = 1:150
b = DataBackendDplyr$new(data, primary_key = "row_id")

# Object supports all accessors of DataBackend
print(b)
b$nrow
b$ncol
b$colnames
b$data(rows = 100:101, cols = "Species")
b$distinct("Species")

# Classification task using this backend
task = mlr3::TaskClassif$new(id = "iris_tibble", backend = b, target = "Species")
print(task)
task$head()

# Create a temporary SQLite data base
con = DBI::dbConnect(RSQLite::SQLite(), ":memory:")
dplyr::copy_to(con, data)
tbl = dplyr::tbl(con, "data")

# Define a backend on a subset of the data base
tbl = dplyr::select_at(tbl, setdiff(colnames(tbl), "Sepal.Width")) # do not use column "Sepal.Width"
tbl = dplyr::filter(tbl, row_id %in% 1:120) # Use only first 120 rows
b = DataBackendDplyr$new(tbl, primary_key = "row_id")
print(b)

# Query disinct values
b$distinct("Species")

# Query number of missing values
b$missing(b$rownames, b$colnames)

# Note that SQLite does not support factors, column Species has been converted to character
lapply(b$head(), class)

# Cleanup
rm(tbl)
DBI::dbDisconnect(con)

mlr-org/mlr3db documentation built on Jan. 15, 2019, 2:29 a.m.