Scheduled R scripts on Master and Slave Google Compute Engine VMs

knitr::opts_chunk$set(echo = TRUE)


$4.09 a month for the master + $1.52 a month per slave (daily 30 min cron job on a 7.5GB RAW instance).

Pricing calculator here

Create the master and slave templates

The master needs to only have enough to up other VMs and download files from Google Cloud Storage: googleComputeEngineR googleCloudStorageR * > of googleAuthR for easy auth on GCE instances via gar_gce_auth() - currently on Github

The master is build from a custom Dockerfile, an example is found in system.file("dockerfiles", "gceScheduler", "master","Dockerfile", package = "googleComputeEngineR")`


master_image <- system.file("dockerfiles", "gceScheduler", "master","Dockerfile", 
                          package = "googleComputeEngineR")

b1 <- gce_docker_build(name = "build1", master_image, build_name = "cron-master")

The slave needs to be have enough R libaries to run your scheduled script, plus googleCloudStorageR to download/uplaod the auth files and script. An example that includes googleAnalyticsR, searchConsoleRand others is found at system.file("dockerfiles", "gceScheduler", "slave","Dockerfile", package = "googleComputeEngineR")

slave_image <- system.file("dockerfiles", "gceScheduler", "slave","Dockerfile", 
                          package = "googleComputeEngineR")

b2 <- gce_docker_build(name = "build2", slave_image, build_name = "cron-slave")

Wait a bit whilst the images build and upload to your own Google Container Registry

## should see your custom image once done
#> [1] "NAME                                       DESCRIPTION   STARS     OFFICIAL   AUTOMATED"
#> [2] "your-project/cron-master                                    0                    "         
#> [3] "your-project/cron-slave                                     0                    "

## delete the build instances

Setup the worker VM

Now we have the templates saved to Container Registry, make a worker VM that is small, and will always be on 24/7 to run cron. This costs ~$4.09 a month.


username <- "mark"

## make the cron-master
master <- gce_vm("cron-master", 
                 predefined_type = "f1-micro",
                 template = "rstudio", 
                 dynamic_image = gce_tag_container("cron-master"),
                 username = username, 
                 password = "mark1234")

## set up SSH from master to slaves with username 'master'
gce_ssh(master, "ssh-keygen -t rsa -f ~/.ssh/google_compute_engine -C master -N ''")

## upload SSH keys to the docker container (probably more secure than keeping keys in Docker container itself)
docker_cmd(master, cmd = "cp", args = sprintf("~/.ssh/ rstudio:/home/%s/.ssh/", username)
docker_cmd(master, cmd = "exec", args = sprintf("rstudio chown -R %s /home/%s/.ssh/", username, username)

Setup slave instance

Create the larger slave instance, that can be then stopped ready for the cron job. These will cost in total $1.52 a month if they run every day for 30 minutes. Here its called slave-1 but a more descriptive name helps, such as a client name.

slave <- gce_vm("slave-1", 
                 predefined_type = "n1-standard-2",
                 template = "rstudio", 
                 dynamic_image = gce_tag_container("cron-slave"),
                 username = "mark", 
                 password = "mark1234")

## wait for it to all install (e.g. RStudio login screen available)
## stop it ready for being started by master VM      

Create scheduled script

Create the R script you want to schedule. Make sure it is self sufficient in that it can authenticate, do stuff and upload to a safe repository, such as Google Cloud Storage.

This script is uploaded itself to Google Cloud Storage, so the slave isntance can call it via:

googleCloudStorageR::gcs_source('download.R', bucket = 'mark-cron')

An example below authenticates with Google Cloud Storage, then downloads a ga.httr-oauth file that carries the Google Analytics authentication, runs the download then reauthenticates with Google Cloud Storage to upload the results.

## download.R - called from slave VM

## set defaults

## gcs can authenticate via GCE auth keys

## use GCS to download auth key (that you have previously uploaded)
gcs_get_object("ga.httr-oauth", saveToDisk = "ga.httr-oauth")

auth_token <- readRDS("ga.httr-oauth")
options(googleAuthR.scopes.selected = c("", 
        googleAuthR.httr_oauth_cache = "ga.httr-oauth")

## fetch data

gadata <- google_analytics_4(81416156,
                             date_range = c(Sys.Date() - 8, Sys.Date() - 1),
                             dimensions = c("medium", "source", "landingPagePath"),
                             metrics = "sessions",
                             max = -1)

## back to Cloud Storage
gcs_upload(gadata, name = "uploads/gadata_81416156.csv")

message("Upload complete", Sys.time())

Create master script

Create the script that will run on master VM. This will start the slave instance, run your scheduled script and stop the slave instance again.

## intended to be run on a small instance via cron
## use this script to launch other VMs with more expensive tasks

## auth to same project we're on

## launch the premade VM
vm <- gce_vm("slave-1")

## set SSH to use 'master' username
vm <- gce_ssh_setup(vm, username = "master", ssh_overwrite = TRUE)

## run the script on the VM
runme <- "Rscript -e \"googleAuthR::gar_gce_auth();googleCloudStorageR::gcs_source('download.R', bucket = 'mark-cron')\""
out <- docker_cmd(vm, 
                  cmd = "exec", 
                  args = c("rstudio", runme), 
                  wait = TRUE)

## once finished, stop the VM

Add worker script to cron

Log in to the master VM and save the script, then schedule it via the cronR RStudio addin.

Try the googleComputeEngineR package in your browser

Any scripts or data that you put into this service are public.

googleComputeEngineR documentation built on May 6, 2019, 1:01 a.m.