gsutil: gsutil command line utility interface

gsutilR Documentation

gsutil command line utility interface


These functions invoke the gsutil command line utility. See the "Details:" section if you have gsutil installed but the package cannot find it.

gsutil_requesterpays(): does the google bucket require that the requester pay for access?

gsutil_ls(): List contents of a google cloud bucket or, if source is missing, all Cloud Storage buckets under your default project ID

gsutil_exists(): check if the bucket or object exists.

gsutil_stat(): print, as a side effect, the status of a bucket, directory, or file.

gsutil_cp(): copy contents of source to destination. At least one of source or destination must be Google cloud bucket; source can be a character vector with length greater than 1. Use gsutil_help("cp") for gsutil help.

gsutil_rm(): remove contents of a google cloud bucket.

gsutil_rsync(): synchronize a source and a destination. If the destination is on the local file system, it must be a directory or not yet exist (in which case a directory will be created).

gsutil_cat(): concatenate bucket objects to standard output

gsutil_help(): print 'man' page for the gsutil command or subcommand. Note that only commandes documented on this R help page are supported.

gsutil_pipe(): create a pipe to read from or write to a gooogle bucket object.



gsutil_ls(source = character(), ..., recursive = FALSE)



gsutil_cp(source, destination, ..., recursive = FALSE, parallel = TRUE)

gsutil_rm(source, ..., force = FALSE, recursive = FALSE, parallel = TRUE)

  exclude = NULL,
  dry = TRUE,
  delete = FALSE,
  recursive = FALSE,
  parallel = TRUE

gsutil_cat(source, ..., header = FALSE, range = integer())

gsutil_help(cmd = character(0))

gsutil_pipe(source, open = "r", ...)



character(1), (character() for gsutil_requesterpays(), gsutil_ls(), gsutil_exists(), gsutil_cp()) paths to a google storage bucket, possibly with wild-cards for file-level pattern matching.


additional arguments passed as-is to the gsutil subcommand.


logical(1); perform operation recursively from source?. Default: FALSE.


character(1), google cloud bucket or local file system destination path.


logical(1), perform parallel multi-threaded / multi-processing (default is TRUE).


logical(1): continue silently despite errors when removing multiple objects. Default: FALSE.


character(1) a python regular expression of bucket paths to exclude from synchronization. E.g., ⁠'.*(\\.png|\\.txt)$"⁠ excludes '.png' and .txt' files.


logical(1), when TRUE (default), return the consequences of the operation without actually performing the operation.


logical(1), when TRUE, remove files in destination that are not in source. Exercise caution when you use this option: it's possible to delete large amounts of data accidentally if, for example, you erroneously reverse source and destination.


logical(1) when TRUE annotate each


(optional) integer(2) vector used to form a range from-to of bytes to concatenate. NA values signify concatenation from the start (first position) or to the end (second position) of the file.


character() (optional) command name, e.g., "ls" for help.


character(1) either "r" (read) or "w" (write) from the bucket.


The gsutil system command is required. The search for gsutil starts with environment variable GCLOUD_SDK_PATH providing a path to a directory containing a bin directory containingin gsutil, gcloud, etc. The path variable is searched for first as an option() and then system variable. If no option or global variable is found, Sys.which() is tried. If that fails, gsutil is searched for on defined paths. On Windows, the search tries to find ⁠Google\\Cloud SDK\\google-cloud-sdk\\bin\\gsutil.cmd⁠ in the ⁠LOCAL APP DATA⁠, ⁠Program Files⁠, and ⁠Program Files (x86)⁠ directories. On linux / macOS, the search continues with ⁠~/google-cloud-sdk⁠.

⁠gsutil_rsync()': To make ⁠"gs://mybucket/data"⁠match the contents of the local directory⁠"data"' you could do:

gsutil_rsync("data", "gs://mybucket/data", delete = TRUE)

To make the local directory "data" the same as the contents of gs://mybucket/data:

gsutil_rsync("gs://mybucket/data", "data", delete = TRUE)

If destination is a local path and does not exist, it will be created.


gsutil_requesterpays(): named logical() vector TRUE when requester-pays is enabled.

gsutil_ls(): character() listing of source content.

gsutil_exists(): logical(1) TRUE if bucket or object exists.

gsutil_stat(): tibble() summarizing status of each bucket member.

gsutil_cp(): exit status of gsutil_cp(), invisibly.

gsutil_rm(): exit status of gsutil_rm(), invisibly.

gsutil_rsync(): exit status of gsutil_rsync(), invisbly.

gsutil_cat() returns the content as a character vector.

gsutil_help(): character() help text for subcommand cmd.

gsutil_pipe() an unopened R pipe(); the mode is not specified, and the pipe must be used in the appropriate context (e.g., a pipe created with open = "r" for input as read.csv())


    src <- "gs://genomics-public-data/1000-genomes/other/sample_info/sample_info.csv"
if (gcloud_exists())
    gsutil_requesterpays(src) # FALSE -- no cost download

if (gcloud_exists()) {

if (gcloud_exists()) {
   gsutil_cp(src, tempdir())
   ## gsutil_*() commands work with spaces in the source or destination
   destination <- file.path(tempdir(), "foo bar")
   gsutil_cp(src, destination)

if (gcloud_exists())

if (gcloud_exists()) {
    df <- read.csv(gsutil_pipe(src), 5L)

Bioconductor/AnVIL documentation built on May 18, 2024, 5:23 a.m.