Being able to trigger R code via public HTTP requests opens up lots of possibilities using webhooks, API calls and so on. You can also make use of private HTTP requests so that services running multiple languages can also easily interface with one another using a concept called micro-services.
The use case described below uses R code deployed to a private as a strategy for running lots of parallel computation, taking advantage of Cloud Run's autoscaling features to turn up and down compute resources as needed.
Hosting the computation behind a plumber API and hosted on Cloud Run, you can control what data is worked on via URL parameters, and loop calling the API.
The R computation behind plumber could call other services or compute directly, although keep in mind the 60min timeout for Cloud Run API responses.
The demo API queries a BigQuery database and computes a forecast for the time-series it gets back. The data is from the Covid public datasets hosted on BigQuery.
The plumber script below has two endpoints:
/
to test its running/covid_traffic
which is fetched with parameters region and industry to specify what times-series to fetch a forecast for. We will want these to be private (e.g. only authenticated calls) since they contain potentially expensive BigQuery queries.
if(Sys.getenv("PORT") == "") Sys.setenv(PORT = 8000) #auth via BQ_AUTH_FILE environment argument library(bigQueryR) library(xts) library(forecast) #' @get / #' @html function(){ "<html><h1>It works!</h1></html>" } #' @get /covid_traffic #' @param industry the industry to filter results down to e.g "Software" #' @param region the region to filter results down to e.g "Europe" function(region=NULL, industry=NULL){ if(any(is.null(region), is.null(industry))){ stop("Must supply region and industry parameters") } # to handle spaces region <- URLdecode(region) industry <- URLdecode(industry) sql <- sprintf("SELECT date, industry, percent_of_baseline FROM `bigquery-public-data.covid19_geotab_mobility_impact.commercial_traffic_by_industry` WHERE region = '%s' order by date LIMIT 1000", region) message("Query: ", sql) traffic <- bqr_query( query = sql, datasetId = "covid19_geotab_mobility_impact", useLegacySql = FALSE, maxResults = 10000 ) # filter to industry in R this time test_data <- traffic[traffic$industry == industry, c("date","percent_of_baseline")] tts <- xts(test_data$percent_of_baseline, order.by = test_data$date, frequency = 7) # replace with long running sophisticated analysis model <- forecast(auto.arima(tts)) # output a list that can be turned into JSON via jsonlite::toJSON o <- list( params = c(region, industry), x = model$x, mean = model$mean, lower = model$lower, upper = model$upper ) message("Return: ", jsonlite::toJSON(o)) o }
When you deploy to Cloud Run, you can choose which service key the Cloud Run service will run under, the default being the GCE default service key. This means you can authenticate using this key without needing to upload your own service JSON file. An example is available in the example Cloud Run app included with the package, deployable via cr_deploy_plumber(system.file("example", package = "googleCloudRunner"))
The relevant R code is shown below, which lets you list a Google Cloud Storage bucket within the same GCP project, reusing the default authentication.
#' List a Google Cloud Storage bucket as an auth example #' @get /gcs_list #' @param bucket the bucket to list. Must be authenticated for this Cloud Run service account function(bucket=NULL){ if(is.null(bucket)){ return("No bucket specified in URL parameter e.g ?bucket=my-bucket") } library(googleCloudStorageR) auth <- gargle::credentials_gce() if(is.null(auth)){ return("Could not authenticate") } message("Authenticated with service token") # put it into googleCloudStorageR auth gcs_auth(token = auth) gcs_list_objects(bucket) }
Once deployed, you can see it listing objects via the endpoint browseURL("https://{your-app-endpoint}/gcs_list?bucket={your-bucket}")
The first step is to host the plumber API above.
The steps below:
library(googleCloudRunner) bs <- c( cr_buildstep_secret("my-auth", decrypted = "inst/docker/parallel_cloudrun/plumber/auth.json"), cr_buildstep_docker("cloudrun_parallel", dir = "inst/docker/parallel_cloudrun/plumber", kaniko_cache = TRUE), cr_buildstep_run("parallel-cloudrun", image = "gcr.io/$PROJECT_ID/cloudrun_parallel:$BUILD_ID", allowUnauthenticated = FALSE, env_vars = "BQ_AUTH_FILE=auth.json,BQ_DEFAULT_PROJECT_ID=$PROJECT_ID") ) # make a buildtrigger pointing at above steps cloudbuild_file <- "inst/docker/parallel_cloudrun/cloudbuild.yml" by <- cr_build_yaml(bs) cr_build_write(by, file = cloudbuild_file)
Now the cloudbuild.yml is available, a build trigger is created so that it will deploy upon each commit to this GitHub:
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner") cr_buildtrigger(cloudbuild_file, "parallel-cloudrun", trigger = repo, includedFiles = "inst/docker/parallel_cloudrun/**")
The files are then committed to GitHub and the build reviewed in the GCP Console.
Once the builds are done, you should have a plumber API deployed with a URL similar to
https://parallel-cloudrun-ewjogewawq-ew.a.run.app/
Now the plumber API is up, but to call it with authentication will need a JSON Web Token, or JWT. These are supported via the cr_jwt_create()
functions:
cr <- cr_run_get("parallel-cloudrun") # Interact with the authenticated Cloud Run service the_url <- cr$status$url jwt <- cr_jwt_create(the_url) # needs to be recreated every 60mins token <- cr_jwt_token(jwt, the_url) # call Cloud Run with token using httr call library(httr) res <- cr_jwt_with_httr( GET("https://parallel-cloudrun-ewjogewawq-ew.a.run.app/"), token) content(res)
The JWT token needs renewing each hour.
The above shows you can call the API's test endpoint, but now we wish to call the API many times with the parameters to filter to the data we want.
This is facilitated by a wrapper function to call our API:
# interact with the API we made call_api <- function(region, industry, token){ api <- sprintf( "https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=%s&industry=%s", URLencode(region), URLencode(industry)) message("Request: ", api) res <- cr_jwt_with_httr(httr::GET(api),token) httr::content(res, as = "text", encoding = "UTF-8") } # test call result <- call_api(region = "Europe", industry = "Software", token = token)
Once the test call is complete, you can loop over your data - the parameters for this particular dataset are shown below:
# the variables to loop over regions <- c("North America", "Europe","South America","Australia") industry <- c("Transportation (non-freight)", "Software", "Telecommunications", "Manufacturing", "Real Estate", "Energy & Utilities", "Education", "Insurance", "Media & Internet", "Minerals & Mining", "Healthcare Services & Hospitals", "Organizations", "Finance")
For parallel processing, ideally you don't want R to block waiting for the HTTP response for each URL you are sending. This is easiest achieved through using curl
and its curl_fetch_multi()
function.
A helper function that takes care of adding multiple URLs to the same curl pool is cr_jwt_with_curl()
- if you pass it a vector of URL strings to call and the token, it will attempt to call all of them at the same time.
An example calling each possible combination of the industries/regions above is shown below, which utilises expand.grid()
to create the URL variations.
## curl multi asynch # function to make the URLs make_urls <- function(regions, industry){ combos <- expand.grid(regions, industry, stringsAsFactors = FALSE) unlist(mapply(function(x,y){sprintf("https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=%s&industry=%s",URLencode(x), URLencode(y))}, SIMPLIFY = FALSE, USE.NAMES = FALSE, combos$Var1 ,combos$Var2)) } # make all the URLs all_urls <- make_urls(regions = regions, industry = industry) # calling them cr_jwt_async(all_urls, token = token)
Some example output is shown below - note that some combinations didn't have data so the API returned a 500 error, but later valid combinations completed and returned the JSON response for the forecast:
> # calling them > cr_jwt_async(all_urls, token = token) ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Transportation%20(non-freight) ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=Europe&industry=Transportation%20(non-freight) ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=South%20America&industry=Transportation%20(non-freight) ℹ 2020-09-20 21:58:13 > 500 failure for request https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Software ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=Australia&industry=Transportation%20(non-freight) ℹ 2020-09-20 21:58:06 > Calling asynch: https://parallel-cloudrun-ewjogewawq-ew.a.run.app/covid_traffic?region=North%20America&industry=Software [[1]] [1] "{\"params\":[\"North America\",\"Transportation (non-freight)\"],\"x\":[100,99,108,104,105,104,105,102,102,108,104,105,104,105,99,98,59,78,78,78,78,100,100,108,106,105,105,105,100,103,109,104,104,105,106,103,102,108,104,104,103,100,96,97,67,64,62,60,59],\"mean\":[64.031,68.3173,71.9691,75.0803,77.731,79.9894,81.9135,83.5527,84.9493,86.1392],\"lower\":[[52.8354,46.9089],[53.6094,45.8236],[55.1655,46.2703],[56.9063,47.2856],[58.6237,48.5089],[60.2322,49.7734],[61.6977,50.9961],[63.0104,52.136],[64.1733,53.1751],[65.1951,54.108]],\"upper\":[[75.2265,81.1531],[83.0251,90.8109],[88.7726,97.6679],[93.2543,102.8751],[96.8384,106.9531],[99.7466,110.2054],[102.1292,112.8308],[104.095,114.9694],[105.7254,116.7235],[107.0833,118.1704]]}" [[2]] [1] "{\"params\":[\"South America\",\"Transportation (non-freight)\"],\"x\":[14,13,12,9,13,17,19,15,11,14,15,16,16,14,11,15,19,18,16,16,12,18,21,18,18,12,18,19,18,19,16,15,17,20,21,20,18,24,26,25,18,22,19,17,21,24,20,19,19,23,27,27,24,22,17,24,26,24,20,18,21,24,23,20,26,19,20,17,21,26,23,19,17,27,23,20,21,22,17,26,23,20,18,21,19,18,28,22,18,24,15,21,27,21,21],\"mean\":[19.7145,21.4333,23.6604,24.1475,22.5433,20.6944,20.6384,22.468,24.3021,24.3207],\"lower\":[[15.9943,14.0249],[17.6591,15.6611],[19.8826,17.8828],[20.3669,18.3656],[18.7179,16.6929],[16.7392,14.6454],[16.566,14.4103],[18.3612,16.1872],[20.1922,18.0166],[20.2036,18.0241]],\"upper\":[[23.4346,25.404],[25.2075,27.2054],[27.4381,29.438],[27.928,29.9293],[26.3686,28.3936],[24.6495,26.7433],[24.7107,26.8665],[26.5747,28.7487],[28.412,30.5876],[28.4378,30.6173]]}" [[3]] [1] "{\"params\":[\"Europe\",\"Transportation (non-freight)\"],\"x\":[40,52,55,50,52,44,59,55,54,60,63,43,54,49,46,38,64,55,53,51,58,61,42,43,40,54,53,60,63,54,44,43,52,38,53,51,47,53,58,44,54,62,56,59,63,48,62,66,60,68,73,69,71,77,77,63,66],\"mean\":[68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688,68.5688],\"lower\":[[58.5753,53.2851],[58.1036,52.5637],[57.6523,51.8735],[57.2189,51.2107],[56.8015,50.5723],[56.3984,49.9558],[56.0082,49.359],[55.6297,48.7802],[55.2621,48.2179],[54.9043,47.6707]],\"upper\":[[78.5622,83.8525],[79.0339,84.5738],[79.4852,85.2641],[79.9186,85.9269],[80.3361,86.5653],[80.7392,87.1818],[81.1294,87.7786],[81.5078,88.3573],[81.8755,88.9196],[82.2333,89.4668]]}"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.