transform_config: Export Airflow transform config from a SageMaker transformer

View source: R/workflow_airflow.R

transform_configR Documentation

Export Airflow transform config from a SageMaker transformer

Description

Export Airflow transform config from a SageMaker transformer

Usage

transform_config(
  transformer,
  data,
  data_type = "S3Prefix",
  content_type = NULL,
  compression_type = NULL,
  split_type = NULL,
  job_name = NULL,
  input_filter = NULL,
  output_filter = NULL,
  join_source = NULL
)

Arguments

transformer

(sagemaker.transformer.Transformer): The SageMaker transformer to export Airflow config from.

data

(str): Input data location in S3.

data_type

(str): What the S3 location defines (default: 'S3Prefix'). Valid values: * 'S3Prefix' - the S3 URI defines a key name prefix. All objects with this prefix will be used as inputs for the transform job. * 'ManifestFile' - the S3 URI points to a single manifest file listing each S3 object to use as an input for the transform job.

content_type

(str): MIME type of the input data (default: None).

compression_type

(str): Compression type of the input data, if compressed (default: None). Valid values: 'Gzip', None.

split_type

(str): The record delimiter for the input object (default: 'None'). Valid values: 'None', 'Line', 'RecordIO', and 'TFRecord'.

job_name

(str): job name (default: None). If not specified, one will be generated.

input_filter

(str): A JSONPath to select a portion of the input to pass to the algorithm container for inference. If you omit the field, it gets the value '$', representing the entire input. For CSV data, each row is taken as a JSON array, so only index-based JSONPaths can be applied, e.g. $[0], $[1:]. CSV data should follow the 'RFC format <https://tools.ietf.org/html/rfc4180>'_. See 'Supported JSONPath Operators <https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html#data-processing-operators>'_ for a table of supported JSONPath operators. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.features" (default: None).

output_filter

(str): A JSONPath to select a portion of the joined/original output to return as the output. For more information, see the SageMaker API documentation for 'CreateTransformJob <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html>'_. Some examples: "$[1:]", "$.prediction" (default: None).

join_source

(str): The source of data to be joined to the transform output. It can be set to 'Input' meaning the entire input record will be joined to the inference result. You can use OutputFilter to select the useful portion before uploading to S3. (default: None). Valid values: Input, None.

Value

dict: Transform config that can be directly used by SageMakerTransformOperator in Airflow.


DyfanJones/sagemaker-r-workflow documentation built on April 3, 2022, 11:28 p.m.