R/tar_resources_aws.R

Defines functions tar_resources_aws

Documented in tar_resources_aws

#' @title Target resources: Amazon Web Services (AWS) S3 storage
#' @export
#' @family resources
#' @description Create the `aws` argument of `tar_resources()`
#'   to specify optional settings to AWS for
#'   `tar_target(..., repository = "aws")`.
#'   See the `format` argument of [tar_target()] for details.
#' @details See the cloud storage section of
#'   <https://books.ropensci.org/targets/data.html>
#'   for details for instructions.
#' @inheritSection tar_resources Resources
#' @return Object of class `"tar_resources_aws"`, to be supplied
#'   to the `aws` argument of `tar_resources()`.
#' @param bucket Character of length 1, name of an existing
#'   bucket to upload and download the return values
#'   of the affected targets during the pipeline.
#' @param prefix Character of length 1, "directory path"
#'   in the bucket where your target object and metadata will go.
#'   Please supply an explicit prefix
#'   unique to your `targets` project.
#'   In the future, `targets` will begin requiring
#'   explicitly user-supplied prefixes. (This last note
#'   was added on 2023-08-24: `targets` version 1.2.2.9000.)
#' @param region Character of length 1, AWS region containing the S3 bucket.
#'   Set to `NULL` to use the default region.
#' @param part_size Positive numeric of length 1, number of bytes
#'   for each part of a multipart upload. (Except the last part,
#'   which is the remainder.) In a multipart upload, each part
#'   must be at least 5 MB. The default value of the `part_size`
#'   argument is `5 * (2 ^ 20)`.
#' @param endpoint Character of length 1, URL endpoint for S3 storage.
#'   Defaults to the Amazon AWS endpoint if `NULL`. Example:
#'   To use the S3 protocol with Google Cloud Storage,
#'   set `endpoint = "https://storage.googleapis.com"`
#'   and `region = "auto"`. (A custom endpoint may require that you
#'   explicitly set a custom region directly in `tar_resources_aws()`.
#'   `region = "auto"` happens to work with Google Cloud.)
#'   Also make sure to create
#'   HMAC access keys in the Google Cloud Storage console
#'   (under Settings => Interoperability) and set the
#'   `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment
#'   variables accordingly. After that, you should be able to use
#'   S3 storage formats with Google Cloud storage buckets.
#'   There is one limitation, however: even if your bucket has
#'   object versioning turned on, `targets` may fail to record object
#'   versions. Google Cloud Storage in particular has this
#'   incompatibility.
#' @param max_tries Positive integer of length 1, maximum number of attempts
#'   to access a network resource on AWS.
#' @param seconds_timeout Positive numeric of length 1,
#'   number of seconds until an HTTP connection times out.
#' @param close_connection Logical of length 1, whether to close HTTP
#'   connections immediately.
#' @param s3_force_path_style Logical of length 1, whether to use path-style
#'   addressing for S3 requests.
#' @param ... Named arguments to functions in `paws.storage::s3()` to manage
#'   S3 storage. The documentation of these specific functions
#'   is linked from `https://www.paws-r-sdk.com/docs/s3/`.
#'   The configurable functions themselves are:
#'   * `paws.storage::s3()$head_object()`
#'   * `paws.storage::s3()$get_object()`
#'   * `paws.storage::s3()$delete_object()`
#'   * `paws.storage::s3()$put_object()`
#'   * `paws.storage::s3()$create_multipart_upload()`
#'   * `paws.storage::s3()$abort_multipart_upload()`
#'   * `paws.storage::s3()$complete_multipart_upload()`
#'   * `paws.storage::s3()$upload_part()`
#'   The named arguments in `...` must not be any of
#'   `"bucket"`, `"Bucket"`, `"key"`, `"Key"`,
#'   `"prefix"`, `"region"`, `"part_size"`, `"endpoint"`,
#'   `"version"`, `"VersionId"`, `"body"`, `"Body"`,
#'   `"metadata"`, `"Metadata"`, `"UploadId"`, `"MultipartUpload"`,
#'   or `"PartNumber"`.
#' @examples
#' # Somewhere in you target script file (usually _targets.R):
#' if (identical(Sys.getenv("TAR_EXAMPLES"), "true")) { # for CRAN
#' tar_target(
#'   name,
#'   command(),
#'   format = "qs",
#'   repository = "aws",
#'   resources = tar_resources(
#'     aws = tar_resources_aws(bucket = "yourbucketname"),
#'     qs = tar_resources_qs(preset = "fast")
#'   )
#' )
#' }
tar_resources_aws <- function(
  bucket = targets::tar_option_get("resources")$aws$bucket,
  prefix = targets::tar_option_get("resources")$aws$prefix,
  region = targets::tar_option_get("resources")$aws$region,
  part_size = targets::tar_option_get("resources")$aws$part_size,
  endpoint = targets::tar_option_get("resources")$aws$endpoint,
  max_tries = targets::tar_option_get("resources")$aws$max_tries,
  seconds_timeout = targets::tar_option_get("resources")$aws$seconds_timeout,
  close_connection = targets::tar_option_get("resources")$aws$close_connection,
  s3_force_path_style = targets::tar_option_get(
    "resources"
  )$aws$s3_force_path_style,
  ...
) {
  if (is.null(prefix)) {
    tar_warn_prefix()
    prefix <- path_store_default()
  }
  prefix <- prefix %|||% targets::tar_path_objects_dir_cloud()
  part_size <- part_size %|||% (5 * (2 ^ 20))
  args <- list(...)
  default_args <- targets::tar_option_get("resources")$aws$args
  for (name in names(default_args)) {
    args[[name]] <- args[[name]] %|||% default_args[[name]]
  }
  out <- resources_aws_init(
    bucket = bucket,
    prefix = prefix,
    region = region,
    part_size = part_size,
    endpoint = endpoint,
    max_tries = max_tries,
    seconds_timeout = seconds_timeout,
    close_connection = close_connection,
    s3_force_path_style = s3_force_path_style,
    args = args
  )
  resources_validate(out)
  out
}

Try the targets package in your browser

Any scripts or data that you put into this service are public.

targets documentation built on Oct. 12, 2023, 5:07 p.m.