DataWranglerProcessor: DataWranglerProcessor class

DataWranglerProcessorR Documentation

DataWranglerProcessor class

Description

Handles Amazon SageMaker DataWrangler tasks

Super class

sagemaker.common::Processor

Methods

Public methods

Inherited methods

Method new()

Initializes a “Processor“ instance. The “Processor“ handles Amazon SageMaker Processing tasks.

Usage
DataWranglerProcessor$new(
  role,
  data_wrangler_flow_source,
  instance_count,
  instance_type,
  volume_size_in_gb = 30L,
  volume_kms_key = NULL,
  output_kms_key = NULL,
  max_runtime_in_seconds = NULL,
  base_job_name = NULL,
  sagemaker_session = NULL,
  env = NULL,
  tags = NULL,
  network_config = NULL
)
Arguments
role

(str): An AWS IAM role name or ARN. Amazon SageMaker Processing uses this role to access AWS resources, such as data stored in Amazon S3.

data_wrangler_flow_source

(str): The source of the DaraWrangler flow which will be used for the DataWrangler job. If a local path is provided, it will automatically be uploaded to S3 under: "s3://<default-bucket-name>/<job-name>/input/<input-name>".

instance_count

(int): The number of instances to run a processing job with.

instance_type

(str): The type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.

volume_size_in_gb

(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).

volume_kms_key

(str): A KMS key for the processing volume (default: None).

output_kms_key

(str): The KMS key ID for processing job outputs (default: None).

max_runtime_in_seconds

(int): Timeout in seconds (default: None). After this amount of time, Amazon SageMaker terminates the job, regardless of its current status. If 'max_runtime_in_seconds' is not specified, the default value is 24 hours.

base_job_name

(str): Prefix for processing job name. If not specified, the processor generates a default job name, based on the processing image name and current timestamp.

sagemaker_session

(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain.

env

(dict[str, str]): Environment variables to be passed to the processing jobs (default: None).

tags

(list[dict]): List of tags to be passed to the processing job (default: None). For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.

network_config

(:class:'~sagemaker.network.NetworkConfig'): A :class:'~sagemaker.network.NetworkConfig' object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.


Method clone()

The objects of this class are cloneable with this method.

Usage
DataWranglerProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


DyfanJones/sagemaker-r-workflow documentation built on April 3, 2022, 11:28 p.m.