gsutil: Interact with the gsutil command line utility

Description Usage Arguments Details Value Examples

Description

These functions invoke the 'gsutil' command line utility. See the "Details:" section if you have gsutil installed but the package cannot find it.

'gsutil_requesterpays()': does the google bucket require that the requester pay for access?

'gsutil_ls()': List contents of a google cloud bucket or, if 'source' is missing, all Cloud Storage buckets under your default project ID

'gsutil_exists()': check if the bucket or object exists.

'gsutil_stat()': print, as a side effect, the status of a bucket, directory, or file.

'gsutil_cp()': copy contents of 'source' to 'destination'. At least one of 'source' or 'destination' must be Google cloud bucket; 'source' can be a character vector with length greater than 1. Use 'gsutil_help("cp")' for 'gsutil' help.

'gsutil_rm()': remove contents of a google cloud bucket.

'gsutil_rsync()': synchronize a source and a destination.

'gsutil_cat()': concatenate bucket objects to standard output

‘gsutil_help()': print ’man' page for the 'gsutil' command or subcommand. Note that only commandes documented on this R help page are supported.

'gsutil_pipe()': create a pipe to read from or write to a gooogle bucket object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
gsutil_requesterpays(source)

gsutil_ls(source = character(), ..., recursive = FALSE)

gsutil_exists(source)

gsutil_stat(source)

gsutil_cp(source, destination, ..., recursive = FALSE, parallel = TRUE)

gsutil_rm(source, ..., force = FALSE, recursive = FALSE, parallel = TRUE)

gsutil_rsync(
  source,
  destination,
  ...,
  dry = TRUE,
  delete = FALSE,
  recursive = FALSE,
  parallel = TRUE
)

gsutil_cat(source, ..., header = FALSE, range = integer())

gsutil_help(cmd = character(0))

gsutil_pipe(source, open = "r", ...)

Arguments

source

'character(1)', ('character()' for 'gsutil_requesterpays()', 'gsutil_ls()', 'gsutil_exists()', 'gsutil_cp()') paths to a google storage bucket, possibly with wild-cards for file-level pattern matching.

...

additional arguments passed as-is to the 'gsutil' subcommand.

recursive

'logical(1)'; perform operation recursively from 'source'?. Default: 'FALSE'.

destination

'character(1)', google cloud bucket or local file system destination path.

parallel

'logical(1)', perform parallel multi-threaded / multi-processing (default is 'TRUE').

force

'logical(1)': continue silently despite errors when removing multiple objects. Default: 'FALSE'.

dry

'logical(1)', when 'TRUE' (default), return the consequences of the operation without actually performing the operation.

delete

'logical(1)', when 'TRUE', remove files in 'destination' that are not in 'source'. Exercise caution when you use this option: it's possible to delete large amounts of data accidentally if, for example, you erroneously reverse source and destination.

header

'logical(1)' when 'TRUE' annotate each

range

(optional) 'integer(2)' vector used to form a range from-to of bytes to concatenate. 'NA' values signify concatenation from the start (first position) or to the end (second position) of the file.

cmd

'character()' (optional) command name, e.g., '"ls"' for help.

open

'character(1)' either '"r"' (read) or '"w"' (write) from the bucket.

Details

The 'gsutil' system command is required. The search for 'gsutil' starts with environment variable 'GCLOUD_SDK_PATH' providing a path to a directory containing a 'bin' directory containingin 'gsutil', 'gcloud', etc. The path variable is searched for first as an 'option()' and then system variable. If no option or global variable is found, 'Sys.which()' is tried. If that fails, 'gsutil' is searched for on defined paths. On Windows, the search tries to find 'Google\Cloud SDK\google-cloud-sdk\bin\gsutil.cmd' in the 'LOCAL APP DATA', 'Program Files', and 'Program Files (x86)' directories. On linux / macOS, the search continues with '~/google-cloud-sdk'.

‘gsutil_rsync()’: To make '"gs://mybucket/data"' match the contents of the local directory '"data"' you could do:

gsutil_rsync("data", "gs://mybucket/data", delete = TRUE)

To make the local directory "data" the same as the contents of gs://mybucket/data:

gsutil_rsync("gs://mybucket/data", "data", delete = TRUE)

If 'destination' is a local path and does not exist, it will be created.

Value

'gsutil_requesterpays()': named 'logical()' vector TRUE when requester-pays is enabled.

'gsutil_ls()': 'character()' listing of 'source' content.

'gsutil_exists()': logical(1) TRUE if bucket or object exists.

'gsutil_stat()': 'character()' description of status of objects matching 'source'.

'gsutil_cp()': exit status of 'gsutil_cp()', invisibly.

'gsutil_rm()': exit status of 'gsutil_rm()', invisibly.

'gsutil_rsync()': exit status of 'gsutil_rsync()', invisbly.

'gsutil_cat()' returns the content as a character vector.

'gsutil_help()': 'character()' help text for subcommand 'cmd'.

'gsutil_pipe()' an unopened R 'pipe()'; the mode is not specified, and the pipe must be used in the appropriate context (e.g., a pipe created with 'open = "r"' for input as 'read.csv()')

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
    src <- "gs://genomics-public-data/1000-genomes/other/sample_info/sample_info.csv"
if (gcloud_exists())
    gsutil_requesterpays(src) # FALSE -- no cost download

if (gcloud_exists()) {
    gsutil_exists(src)
    gsutil_stat(src)
    gsutil_ls(dirname(src))
}

if (gcloud_exists())
   gsutil_cp(src, tempdir())

if (gcloud_exists())
    gsutil_help("ls")

if (gcloud_exists()) {
    df <- read.csv(gsutil_pipe(src), 5L)
    class(df)
    dim(df)
    head(df)
}

AnVIL documentation built on Nov. 8, 2020, 4:57 p.m.