create_statement: Generate a CREATE TABLE statement from a model

Description Usage Arguments Value Supported packages Warning Examples

View source: R/sqlscore.R

Description

Generate a CREATE TABLE statement to score the passed model on a preexisting database table. The statement will generate predictions entirely in the database, with no need to fetch data into R. Models need not be GLMs, but their prediction steps must consist of applying a response function to a linear predictor.

Usage

1
2
3
4
create_statement(mod, dest_table, src_table, dest_schema = NULL,
  dest_catalog = NULL, src_schema = NULL, src_catalog = NULL,
  drop = FALSE, temporary = FALSE, pk = c("id"), response = NULL,
  con = NULL)

Arguments

mod

A supported model object.

dest_table

The unqualified DB name of the destination table.

src_table

The unqualified DB name of the source table.

dest_schema

The DB schema of the destination table.

dest_catalog

The DB catalog of the destination table.

src_schema

The DB schema of the source table.

src_catalog

The DB catalog of the source table.

drop

Whether to generate a DROP TABLE IF EXISTS before the CREATE TABLE.

temporary

Whether the destination table should be a temporary table.

pk

A vector of primary key column names.

response

The name of a custom response function to apply to the linear predictor.

con

An optional DBI connection to control the details of SQL generation.

Value

A dbplyr SQL object representing the SELECT statement.

Supported packages

Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm.

Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to.

Warning

Note that if the model object transformed its training data before fitting (e.g., centering and scaling predictors), the generated SQL statement will not include those transformations. A future release may include that functionality, but centering and scaling in particular are difficult to do efficiently and portably in SQL.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Basic create statements
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
           data=datasets::iris)
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
                 src_catalog="catalog_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
                 src_catalog="catalog_name", dest_table="target_tbl",
                 dest_schema="target_schema", dest_catalog="target_catalog",
                 pk=c("lab", "specimen_id"))

#With a custom response function
create_statement(mod, src_table="tbl_name", src_schema="schema_name",
                 dest_table="target_tbl", response="probit")

# With a model-derived non-identity response function
mod <- glm(Sepal.Length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species,
           data=datasets::iris, family=binomial("logit"))
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")

#With formula operators
x <- matrix(rnorm(100*20),100,20)
colnames(x) <- sapply(1:20, function(x) paste0("X", as.character(x)))
x <- as.data.frame(x)
mod <- glm(X2 ~ X3 + X5 + X15*X8, data=x)
create_statement(mod, src_table="tbl_name", dest_table="target_tbl")
create_statement(mod, src_table="tbl_name", dest_table="target_tbl",
                 response="cauchit")

sqlscore documentation built on April 29, 2018, 9:03 a.m.