A very common use case in my line of work is to make Google Analytics API scheduled calls. This example shows how to use googleAnalyticsR
within your build steps, then gcloud
to interact with other Google Cloud Platform services, such as BigQuery or Cloud Storage.
This example supposes you have a lot of GDPR requests to delete data in your Google Analytics set-up via the User Deletion API. This API only allows 500 requests per day, so if you have more than that you need to schedule the batches.
To perform the deletions, we suppose we have a csv file on Google Cloud Storage that has two columns: cid and UA property. The script will download this file, perform user deletions on 500 of them via googleAnalyticsR::ga_clientid_deletion()
and then upload a record of the deleted rows to cross-reference the next day.
A user could update the delete-all.csv
on Google Cloud Storage with the requested deletions for GDPR compliance, and check the deletes.csv
for which have been deleted already.
An example script is shown below:
library(googleAnalyticsR) # enable scopes for user deletions and listing GA accounts options(googleAuthR.scopes.selected = c( "https://www.googleapis.com/auth/analytics.user.deletion", "https://www.googleapis.com/auth/analytics.edit")) # auth with a service auth key - make sure its email is added as a GA user ga_auth(json_file = "auth.json") # what we want to delete todo <- read.csv("delete-all.csv", stringsAsFactors = FALSE, colClasses = "character") # what has been deleted already old_deletes <- read.csv("deletes.csv", stringsAsFactors = FALSE, colClasses = "character") # all rows without a delete flag todo_filtered <- dplyr::anti_join(todo, old_deletes, by = c(cid = "userId")) # we can only do 500 per day todo_filtered <- head(todo_filtered, n = 500) # we only do one UA code per run splits <- split(todo_filtered, todo_filtered$ua) do_these_cids <- splits[[1]]$cid do_this_ua <- names(splits)[[1]] message("Deleting IDs - should take around 6 mins to do 500") # to have more logs in the build options(googleAuthR.verbose = 2) # 500 APIs calls deleted <- ga_clientid_deletion(do_these_cids, propertyId = do_this_ua) upload_deletes <- rbind(old_deletes, deleted) if(nrow(deleted) > 0){ # upload this file to Google Cloud STorage so they aren't deleted again write.csv(deleted, file = "deletes.csv", row.names = FALSE) } else { warning("No deletions were made", call. = FALSE) }
The script assumes three files existing in the folder: the authentication file auth.json
which is a service account key whose email has been added to the GA accounts; and the two files tracking ID progress - you will need to create delete-all.csv
and deletes.csv
.
This should be a CSV file with cid and ua columns:
| cid | ua | |-----|-----| | 123.321 | UA-123456-3 | | 432.342 | UA-123456-3 | | 545.343 | UA-123456-2 |
There won't be any deleted IDs the first time it runs, so you can generate an empty deletes.csv
file via:
deleted <- data.frame(userId = NA, id_type = NA, property = NA, deletionRequestTime = NA, stringsAsFactors = FALSE) write.csv(deleted, "deletes.csv", row.names = FALSE)
Once the first run is made, this file will be appended to with the 500 entries it has deleted - it looks like this:
| userId | id_type | property | deletionRequestTime | |-----|-----|-----|-----| | 123.321 | CLIENT_ID | UA-123456-3 | 2021-09-07T11:48:15.285Z | | 432.342 | CLIENT_ID | UA-123456-3 | 2021-09-07T11:48:17.390Z | | 545.343 | CLIENT_ID | UA-123456-2 | 2021-09-07T11:48:19.420Z |
We now create the build around the R script that will download the necessary files and upload the results. We save the above script locally to delete.R
- then when we call cr_buildstep_r("delete.R")
it will pull that script in and create a build step with the R code embedded within it:
library(googleCloudRunner) bs <- c( cr_buildstep_secret("user-deletion-key", "auth.json"), cr_buildstep_gcloud("gsutil", args = c("cp", "gs://your-bucket/delete-all.csv", "delete-all.csv")), cr_buildstep_gcloud("gsutil", args = c("cp", "gs://your-bucket/deletes.csv", "deletes.csv")), cr_buildstep_r( "delete.R", name = "gcr.io/gcer-public/googleanalyticsr:master" ), cr_buildstep_gcloud("gsutil", args = c("cp", "deletes.csv", "gs://your-bucket/deletes.csv")) ) yaml <- cr_build_yaml(bs) build <- cr_build_make(yaml, timeout = 1200) # can test a build first via: #built <- cr_build(build) schedule_me <- cr_schedule_http(build) cr_schedule("delete-cids", schedule = "5 1 * * *", httpTarget = schedule_me)
The build assumes you have uploaded the authentication service key to Secret Manager for use within cr_buildstep_secret()
and then ga_auth()
, and the required files are uploaded to a Cloud Storage bucket for use within cr_buildstep_gcloud()
.
It turns out 500 API deletions is just over the default of 10mins build run time, so the time out is increased to 20mins via cr_build_make(yaml, timeout = 1200)
The googleAnalyticsR
API calls make use of the public Docker file that is built upon each commit to GitHub of the package, available at gcr.io/gcer-public/googleAnalyticsR:master
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.