knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE )
library(gargle)
This article has two purposes:
credentials_gce()
.I use the googleComputeEngineR package to work with GCE VMs.
I have done the required setup for this package:
.Renviron
:
GCE_AUTH_FILE="/path/to/that/json/mentioned/above.json"
GCE_DEFAULT_PROJECT_ID="gargle-gce"
GCE_DEFAULT_ZONE="us-west1-a"
Having done this setup, this is how attaching the package looks:
library(googleComputeEngineR) #> ✔ Setting scopes to https://www.googleapis.com/auth/cloud-platform #> ✔ Successfully auto-authenticated via /path/to/that/json/mentioned/above.json #> Set default project ID to 'gargle-gce' #> Set default zone to 'us-west1-a'
Note that (I think) the scopes mentioned above are about googleComputeEngineR's activities.
I don't think this has any direct connection to instance scopes for VMs created by googleComputeEngineR.
(Although, of course, "https://www.googleapis.com/auth/cloud-platform"
is the default, recommended scope for both contexts.)
You can see your current instances with gce_list_instances()
:
gce_list_instances() #> ==Google Compute Engine Instance List== #> name machineType status zone externalIP creationTimestamp #> 1 gargle-gce-rstudio-server e2-standard-4 TERMINATED us-west1-a No external IP 2022-10-21 14:41:46 #> 2 majestic-cuckoo e2-standard-4 TERMINATED us-west1-a No external IP 2023-04-11 18:56:14 #> 3 piggish-salmon e2-standard-4 TERMINATED us-west1-a No external IP 2023-04-12 07:22:26 #> 4 tricky-fox e2-standard-4 TERMINATED us-west1-a No external IP 2023-04-12 12:00:52
The above reflects how things look after I've been mucking around a bit and have several VMs that are currently stopped.
Here's my basic way of creating a VM:
vm <- gce_vm( template = "rstudio", name = "cerebral-lion", username = "jenny", password = "jenny1234", predefined_type = "e2-standard-4" )
I can no longer remember why I settled on predefined_type = "e2-standard-4"
.
Here's what you'll see:
#> ── ## VM Template: ' rstudio' running at http://{IP_ADDRESS} ───────────────────────────────────────────────────── #> ℹ 2023-04-13 12:03:05 > On first boot, wait a few minutes for docker container to install before logging in. #> ==Google Compute Engine Instance== #> #> Name: cerebral-lion #> Created: 2023-04-13 12:02:44 #> Machine Type: e2-standard-4 #> Status: RUNNING #> Zone: us-west1-a #> External IP: {IP_ADDRESS} #> Disks: #> deviceName type mode boot autoDelete #> 1 cerebral-lion-boot-disk PERSISTENT READ_WRITE TRUE TRUE #> #> Metadata: #> key value #> 2 template rstudio #> 3 google-logging-enabled true #> 4 rstudio_user jenny #> 5 rstudio_pw jenny1234 #> 6 gcer_docker_image rocker/tidyverse
You can then log in to RStudio Server at the given {IP_ADDRESS}
.
Helpful snippets for getting that on the clipboard:
# if you, e.g., just created `vm` paste0("http://", gce_get_external_ip(vm)) |> clipr::write_clip() # if you want to refer to the instance by name paste0("http://", gce_get_external_ip("cerebral-lion")) |> clipr::write_clip()
Under the hood, googleComputeEngineR is inserting its own default choices for the associated service account and scopes. It's actually as if you had done:
gce_vm( ..., serviceAccounts = list( email = "{EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT}", scopes = "https://www.googleapis.com/auth/cloud-platform" ) )
Below we will create another VM and pass serviceAccounts
explicitly, so we can also specify the scopes.
To learn more: https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances.
Log in to RStudio Server in the VM (see above for getting the IP address). First, for my current exploration, I want to install gargle from a specific branch:
install.packages("pak") pak::pak("r-lib/gargle@gce-improvements")
Now attach gargle and set verbosity level to "debug"
.
library(gargle) local_gargle_verbosity("debug")
Let's look at the service accounts available to this running instance:
gce_instance_service_accounts() #> name email aliases #> 1 {EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT} {EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT} default #> 2 default {EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT} default #> scopes #> 1 https://www.googleapis.com/auth/cloud-platform #> 2 https://www.googleapis.com/auth/cloud-platform
You can enable multiple virtual machine instances to use the same service account, but a virtual machine instance can only have one service account identity.
So there will only ever be 1 actual service account identify, but you might see two rows here, as we do above, because the default service account can be referred to by 2 names: its email and as default
.
Let's get a token with token_fetch()
and inspect it.
t <- token_fetch() #> trying `token_fetch()` #> ... #> Trying `credentials_gce()` ... #> GceToken initialize #> GceToken init_credentials #> GCE service account email: '{EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT}' #> GCE service account name: "default" #> GCE access token scopes: "...cloud-platform" t #> #> ── <GceToken (via gargle)> ────────────────────────────────────────────────────────────────────────────────────────────────────── #> scopes: ...cloud-platform #> credentials: access_token, expires_in, token_type
By default, credentials_gce()
uses the default
service account and the "cloud-platform"
scope.
What if we want to do something with the Google Drive API and we request that scope?
t <- token_fetch(c( "https://www.googleapis.com/auth/cloud-platform", "https://www.googleapis.com/auth/drive" )) #> trying `token_fetch()` #> ... #> Trying `credentials_gce()` ... #> ! This requested scope is not among the scopes for the "default" service account: #> ✖ https://www.googleapis.com/auth/drive #> ℹ If there are problems downstream, this might be the root cause. #> GceToken initialize #> GceToken init_credentials #> ! This requested scope is not among the scopes for the access token returned by the metadata server: #> ✖ https://www.googleapis.com/auth/drive #> ℹ If there are problems downstream, this might be the root cause. #> ! Updating token scopes to reflect its actual scopes: #> • https://www.googleapis.com/auth/cloud-platform #> GCE service account email: '{EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT}' #> GCE service account name: "default" #> GCE access token scopes: "...cloud-platform" t #> #> ── <GceToken (via gargle)> ────────────────────────────────────────────────────────────────────────────────────────────────────── #> scopes: ...cloud-platform #> credentials: access_token, expires_in, token_type
We get a token, but still with only the "cloud-platform"
scope, because the Drive scope was not specified when this VM was created:
You can use the access token only for scopes that you specified when you created the instance. For example, if the instance has been granted only the
https://www.googleapis.com/auth/storage-full
scope for Cloud Storage, then it can't use the access token to make a request to BigQuery.
And, indeed, this lack of an explicit Drive scope means that, e.g., the googledrive package can't do operations that require auth:
library(googledrive) drive_find() #> attempt to access internal gargle data from: googledrive #> Error in `gargle::response_process()`: #> ! Client error: (403) Forbidden #> Request had insufficient authentication scopes. #> PERMISSION_DENIED #> • message: Insufficient Permission #> • domain: global #> • reason: insufficientPermissions #> Backtrace: #> ▆ #> 1. └─googledrive::drive_find() #> 2. └─googledrive::do_paginated_request(request, n_max = n_max, n = function(x) length(x$files)) #> 3. └─gargle::response_process(page) #> 4. └─gargle:::gargle_abort_request_failed(error_message(resp), resp) #> 5. └─gargle:::gargle_abort(...) #> 6. └─cli::cli_abort(...) #> 7. └─rlang::abort(...)
If you're not actively working on the VM, you should at least suspend it. Then you could resume it to pick up where you left off. To ensure that you aren't incurring any charges, you should stop the machine, but then you'll have to start over if you've, e.g., installed dev packages or downloaded/created any files.
gce_vm_suspend("cerebral-lion") gce_vm_resume("cerebral-lion") gce_vm_stop("cerebral-lion")
It's a good idea to check that you've done whatever you intended with the instance. Check its status here:
gce_list_instances() #> ==Google Compute Engine Instance List== #> name machineType status zone externalIP creationTimestamp #> 1 cerebral-lion e2-standard-4 SUSPENDED us-west1-a No external IP 2023-04-13 12:02:44 #> 2 gargle-gce-rstudio-server e2-standard-4 TERMINATED us-west1-a No external IP 2022-10-21 14:41:46 #> 3 majestic-cuckoo e2-standard-4 TERMINATED us-west1-a No external IP 2023-04-11 18:56:14 #> 4 piggish-salmon e2-standard-4 TERMINATED us-west1-a No external IP 2023-04-12 07:22:26 #> 5 tricky-fox e2-standard-4 TERMINATED us-west1-a No external IP 2023-04-12 12:00:52
Now we're going to specifically request Drive scope for a VM. AFAICT googleComputeEngineR only helps you set scopes at the time of VM creation, so I'm going to create a new instance. It seems possible to change scopes for pre-existing instance as long as it is stopped, so maybe that could be a feature request for googleComputeEngineR (or maybe I'm overlooking that there's already a way to do this). Further reading: https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances#changeserviceaccountandscopes.
vm <- gce_vm( template = "rstudio", name = "trustful-bull", username = "jenny", password = "jenny1234", predefined_type = "e2-standard-4", serviceAccounts = list( list( email = "{EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT}", scopes = c( "https://www.googleapis.com/auth/cloud-platform", "https://www.googleapis.com/auth/drive" ) ) ), )
I get the IP address, log in to RStudio Server, and install the desired version of gargle (not shown).
gce_instance_service_accounts()
shows that we have, in fact, managed to change the scopes
available to the default service account:
gce_instance_service_accounts() #> name email aliases #> 1 {EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT} {EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT} default #> 2 default {EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT} default #> scopes #> 1 https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive #> 2 https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive
We see this in actual tokens as well.
Note that we get "drive"
scope even if we don't ask for it.
t <- token_fetch() #> trying `token_fetch()` #> ... #> Trying `credentials_gce()` ... #> GceToken initialize #> GceToken init_credentials #> ! Updating token scopes to reflect its actual scopes: #> • https://www.googleapis.com/auth/cloud-platform #> • https://www.googleapis.com/auth/drive #> GCE service account email: '{EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT}' #> GCE service account name: "default" #> GCE access token scopes: "...cloud-platform, ...drive" t #> #> ── <GceToken (via gargle)> ────────────────────────────────────────────────────────────────────────────────────────────────────── #> scopes: ...cloud-platform, ...drive #> credentials: access_token, expires_in, token_type
And, as one would expect, it's now possible to work with the googledrive package.
library(googledrive) drive_find() #> # A dribble: 0 × 3 #> # ℹ 3 variables: name <chr>, id <drv_id>, drive_resource <list> drive_user() #> Logged in as: #> • displayName: '{EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT}' #> • emailAddress: '{EMAIL_OF_THE_DEFAULT_SERVICE_ACCOUNT}'
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.