In rstudio-conf-2020/big-data: Content and setup files for the Big Data with R class

eval_model <- FALSE
if(Sys.getenv("GLOBAL_EVAL") != "") eval_model <- Sys.getenv("GLOBAL_EVAL")

library(tidymodels)
library(yaml)
library(modeldb)
library(dbplot)
library(pins)
library(dbplyr)
library(RPostgres)
library(connections)
library(config)

Modeling with databases

Single step sampling

Use PostgreSQL TABLESAMPLE clause

Use connection_open() to open a Database connection r con <- connection_open( RPostgres::Postgres(), host = "localhost", user = get("user"), password = get("pwd"), port = 5432, dbname = "postgres", bigint = "integer" )
Set the orders variable to point to the orders table r orders <- tbl(con, in_schema("retail", "orders"))
Set the orders_view variable to point to the v_orders table ```r

```
Pipe orders into the function show_query() ```r

```
Pipe the previous command into the class() function to see the kind of output show_query() returns ```r

```
Replace show_query() with remote_query() to compare the output types ```r

```
Replace class() with build_sql(). Use con as the value for the con argument ```r

```
Add " TABLESAMPLE BERNOULLI (0.1)" to build_sql() as another ... argument ```r

```
Pipe the code into tbl(). Use con for the con argument, and . for the rest ```r

```
Use inner_join() to add the information from the orders_view pointer, use order_id as the matching field ```r

```
Assign the resulting code to a variable orders_sample_db r orders_sample_db <-
Use collect() to load the results of orders_sample_db to a new variable called orders_sample r orders_sample <- collect(orders_sample_db)
Load the dbplot library r library(dbplot)
Use dbplot_histogram() to visualize the distribution of order_total from orders_sample r orders_sample %>% dbplot_histogram(order_total, binwidth = 5)
Use dbplot_histogram() to visualize the distribution of order_total from orders_view r orders_view %>% dbplot_histogram(order_total, binwidth = 5)

Using `tidymodels` for modeling

Fit and measure the model's performance using functions from parsnip and yardstick

Load the tidymodels library r library(tidymodels)
Start with the linear_reg() command, pipe into set_engine(), and use "lm" as its sole argument ```r

```
Pipe into the fit() command. Use the formula: order_total ~ order_qty, and orders_sample as the data argument r
Assign the previous code to a variable called parsnip_model r parsnip_model <-
Use bind_cols() to add the predictions to order_sample. Calculate the prediction with predict() r orders_sample %>% bind_cols(predict(parsnip_model, orders_sample))
Pipe the code into the metrics() function. Use order_total as the truth argument, and .pred as the estimate argument ```r

```

Score with `tidypredict`

Load the tidypredict library r library(tidypredict)
Use the parse_model() function to parse parsnip_model, and assign it to a variable called parsed_parsnip ```r

```
Use str() to see the parsed_parsnip object's structure ```r

```
Use tidypredict_fit() to view the dplyr formula that calculates the prediction ```r

```
Use head() to get the first 10 records from orders_view r orders_view %>% head(10)
Pipe the code into mutate(). Assign to a new my_pred variable the results of tidypredict_fit(). Make sure to prefix tidypredict_fit() with the bang-bang operator so that the formula is evaluated. ```r orders_view %>% head(10) %>%

```
Replace the mutate() command with tidypredict_to_column() r orders_view %>% head(10) %>% tidypredict_to_column(parsnip_model)
Load the yaml library r library(yaml)
Use write_yaml() to save the contents of parsed_parsnip into a file called model.yaml ```r

```
Using read_yaml(), read the contents of the model.yaml file into the a new variable called loaded_model ```r

```
Use as_parsed_model() to convert the loaded_model variable into a tidypredict parsed model object, assign the results to loaded_model_2 r loaded_model_2 <- as_parsed_model(loaded_model)

Run predictions in DB

Load the modeldb library r library(modeldb)
Use select() to pick the order_total and order_qty fields from the orders_sample_db table pointer r orders_sample_db %>% select(order_total, order_qty)
Pipe the code into the linear_regression_db() function, pass order_total as the only argument ```r orders_sample_db %>% select(order_total, order_qty) %>%

```
Assign the model results to a new variable called db_model r db_model <-
Use as_parsed_model() to convert db_model to a parsed model object. Assign to new a variable called pm ```r

```
Use head() to get the top 10 records, and then pipe into tidypredict_to_column() to add the results from pm ```r

```

rstudio-conf-2020/big-data documentation built on Feb. 4, 2020, 5:24 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rstudio-conf-2020/big-data
Content and setup files for the Big Data with R class

In rstudio-conf-2020/big-data: Content and setup files for the Big Data with R class

Modeling with databases

Single step sampling

Using `tidymodels` for modeling

Score with `tidypredict`

Run predictions in DB

R Package Documentation

Browse R Packages

We want your feedback!

rstudio-conf-2020/big-data Content and setup files for the Big Data with R class

In rstudio-conf-2020/big-data: Content and setup files for the Big Data with R class

Modeling with databases

Single step sampling

Using tidymodels for modeling

Score with tidypredict

Run predictions in DB

R Package Documentation

Browse R Packages

We want your feedback!

rstudio-conf-2020/big-data
Content and setup files for the Big Data with R class

Using `tidymodels` for modeling

Score with `tidypredict`