Description Usage Arguments Details Value Supported packages Warning Examples
Generate a SELECT statement to score the passed model on a preexisting database table. The statement will generate predictions entirely in the database, with no need to fetch data into R. Models need not be GLMs, but their prediction steps must consist of applying a response function to a linear predictor.
1 2 | select_statement(mod, src_table, src_schema = NULL, src_catalog = NULL,
pk = c("id"), response = NULL, con = dbplyr::simulate_dbi())
|
mod |
A supported model object. |
src_table |
The unqualified DB name of the source table. |
src_schema |
The DB schema of the source table. |
src_catalog |
The DB catalog of the source table. |
pk |
A vector of primary key column names. |
response |
The name of a custom response function to apply to the linear predictor. |
con |
A DBI connection to control the details of SQL generation; defaults to dbplyr::simulate_dbi() for the best guess at portable SQL. |
An open database connection can be passed as the 'con' argument, or the ‘dbplyr::simulate_*' functions can be used in applications which don’t have a DB connection when they need to generate SQL.
A dbplyr SQL object representing the SELECT statement.
Specific packages and models that are known to work include: glm and lm from package:stats, cv.glmnet from package:glmnet, glmboost from package:mboost, and bayesglm from package:arm.
Default S3 methods are for objects structured like those of class "glm", so models not listed here may work if they resemble those objects, but are not guaranteed to.
Note that if the model object transformed its training data before fitting (e.g., centering and scaling predictors), the generated SQL statement will not include those transformations. A future release may include that functionality, but centering and scaling in particular are difficult to do efficiently and portably in SQL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # Basic select statements
mod <- glm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris)
select_statement(mod, src_table="tbl_name")
select_statement(mod, src_table="tbl_name", src_schema="schema_name",
src_catalog="catalog_name")
select_statement(mod, src_table="tbl_name", src_schema="schema_name",
src_catalog="catalog_name", pk=c("lab", "specimen_id"))
#With a custom response function
select_statement(mod, src_table="tbl_name", src_schema="schema_name",
response="probit")
# With a model-derived non-identity response function
mod <- glm(Sepal.Length > 5.0 ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=datasets::iris, family=binomial("logit"))
select_statement(mod, src_table="tbl_name")
#With formula operators
x <- matrix(rnorm(100*20),100,20)
colnames(x) <- sapply(1:20, function(x) paste0("X", as.character(x)))
x <- as.data.frame(x)
mod <- glm(X2 ~ X3 + X5 + X15*X8, data=x)
select_statement(mod, src_table="tbl_name")
select_statement(mod, src_table="tbl_name", response="cauchit")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.