This workflow takes advatage of the future
integration to run your local R-functions within a cluster of GCE machines.
You can do this to throw up expensive computations by spinning up a cluster and tearing it down again once you are done.
In summary, this workflow:
The example below uses a default r-base
template, but you can use the steps above to create a dynamic_template
pulled from the Container Registry if required.
Instead of the more generic gce_vm()
that is used for more interactive use, we create the instances directly using gce_vm_container()
so it doesn't wait for the job to complete before starting the next (not useful if you have a lot of VMs). You can then use gce_get_zone_op()
to get the job status.
library(future) library(googleComputeEngineR) ## names for your cluster vm_names <- c("vm1","vm2","vm3") ## create the cluster using default template for r-base ## creates jobs that are creating VMs in background jobs <- lapply(vm_names, function(x) { gce_vm_container(file = get_template_file("r-base"), predefined_type = "n1-highmem-2", name = x) }) jobs # [[1]] # ==Operation insert : PENDING # Started: 2016-11-16 06:52:58 # [[2]] # ==Operation insert : PENDING # Started: 2016-11-16 06:53:04 # [[3]] # ==Operation insert : PENDING # Started: 2016-11-16 06:53:09 ## check status of jobs lapply(jobs, gce_get_zone_op) # [[1]] # ==Operation insert : DONE # Started: 2016-11-16 06:52:58 # Ended: 2016-11-16 06:53:14 # Operation complete in 16 secs # [[2]] # ==Operation insert : DONE # Started: 2016-11-16 06:53:04 # Ended: 2016-11-16 06:53:20 # Operation complete in 16 secs # [[3]] # ==Operation insert : DONE # Started: 2016-11-16 06:53:09 # Ended: 2016-11-16 06:53:30 # Operation complete in 21 secs ## get the VM objects vms <- lapply(vm_names, gce_vm)
It is safest to setup the SSH keys seperately for multiple instances, using gce_ssh_setup()
- this is normally called for you when you first connect to a VM.
## set up SSH for the VMs vms <- lapply(vms, gce_ssh_setup)
We now make the VM cluster as per details given in the future README
## make a future cluster plan(cluster, workers = vms)
The cluster is now ready to recieve jobs. You can send them by simply using %<-%
instead of <-
.
## use %<-% to send functions to work on cluster ## See future README for details: https://github.com/HenrikBengtsson/future a %<-% Sys.getpid() ## make a big function to run asynchronously f <- function(my_data, args){ ## ....expensive...computations result } ## send to cluster result %<-% f(my_data)
For long running jobs you can use future::resolved
to check on its progress.
## check if resolved resolved(result) [1] TRUE
Remember to shut down your cluster. You are charged per minute, per instance of uptime.
## shutdown instances when finished lapply(vms, gce_vm_stop)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.