FrameworkProcessor: FrameworkProcessor class

FrameworkProcessorR Documentation

FrameworkProcessor class

Description

Handles Amazon SageMaker processing tasks for jobs using a machine learning framework

Super classes

sagemaker.common::Processor -> sagemaker.common::ScriptProcessor -> FrameworkProcessor

Public fields

framework_entrypoint_command

Methods

Public methods

Inherited methods

Method new()

Initializes a “FrameworkProcessor“ instance. The “FrameworkProcessor“ handles Amazon SageMaker Processing tasks for jobs using a machine learning framework, which allows for a set of Python scripts to be run as part of the Processing Job.

Usage
FrameworkProcessor$new(
  estimator_cls,
  framework_version,
  role,
  instance_count,
  instance_type,
  py_version = "py3",
  image_uri = NULL,
  command = NULL,
  volume_size_in_gb = 30,
  volume_kms_key = NULL,
  output_kms_key = NULL,
  code_location = NULL,
  max_runtime_in_seconds = NULL,
  base_job_name = NULL,
  sagemaker_session = NULL,
  env = NULL,
  tags = NULL,
  network_config = NULL
)
Arguments
estimator_cls

(type): A subclass of the :class:'~sagemaker.estimator.Framework' estimator

framework_version

(str): The version of the framework. Value is ignored when “image_uri“ is provided.

role

(str): An AWS IAM role name or ARN. Amazon SageMaker Processing uses this role to access AWS resources, such as data stored in Amazon S3.

instance_count

(int): The number of instances to run a processing job with.

instance_type

(str): The type of EC2 instance to use for processing, for example, 'ml.c4.xlarge'.

py_version

(str): Python version you want to use for executing your model training code. One of 'py2' or 'py3'. Defaults to 'py3'. Value is ignored when “image_uri“ is provided.

image_uri

(str): The URI of the Docker image to use for the processing jobs (default: None).

command

([str]): The command to run, along with any command-line flags to *precede* the “'code script“'. Example: ["python3", "-v"]. If not provided, ["python"] will be chosen (default: None).

volume_size_in_gb

(int): Size in GB of the EBS volume to use for storing data during processing (default: 30).

volume_kms_key

(str): A KMS key for the processing volume (default: None).

output_kms_key

(str): The KMS key ID for processing job outputs (default: None).

code_location

(str): The S3 prefix URI where custom code will be uploaded (default: None). The code file uploaded to S3 is 'code_location/job-name/source/sourcedir.tar.gz'. If not specified, the default “code location“ is 's3://sagemaker-default-bucket'

max_runtime_in_seconds

(int): Timeout in seconds (default: None). After this amount of time, Amazon SageMaker terminates the job, regardless of its current status. If 'max_runtime_in_seconds' is not specified, the default value is 24 hours.

base_job_name

(str): Prefix for processing name. If not specified, the processor generates a default job name, based on the processing image name and current timestamp (default: None).

sagemaker_session

(:class:'~sagemaker.session.Session'): Session object which manages interactions with Amazon SageMaker and any other AWS services needed. If not specified, the processor creates one using the default AWS configuration chain (default: None).

env

(dict[str, str]): Environment variables to be passed to the processing jobs (default: None).

tags

(list[dict]): List of tags to be passed to the processing job (default: None). For more, see https://docs.aws.amazon.com/sagemaker/latest/dg/API_Tag.html.

network_config

(:class:'~sagemaker.network.NetworkConfig'): A :class:'~sagemaker.network.NetworkConfig' object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets (default: None).


Method get_run_args()

This object contains the normalized inputs, outputs and arguments needed when using a “FrameworkProcessor“ in a :class:'~sagemaker.workflow.steps.ProcessingStep'.

Usage
FrameworkProcessor$get_run_args(
  code,
  source_dir = NULL,
  dependencies = NULL,
  git_config = NULL,
  inputs = NULL,
  outputs = NULL,
  arguments = NULL,
  job_name = NULL
)
Arguments
code

(str): This can be an S3 URI or a local path to a file with the framework script to run. See the “code“ argument in 'sagemaker.processing.FrameworkProcessor.run()'.

source_dir

(str): Path (absolute, relative, or an S3 URI) to a directory wit any other processing source code dependencies aside from the entrypoint file (default: None). See the “source_dir“ argument in 'sagemaker.processing.FrameworkProcessor.run()'

dependencies

(list[str]): A list of paths to directories (absolute or relative) with any additional libraries that will be exported to the container (default: []). See the “dependencies“ argument in 'sagemaker.processing.FrameworkProcessor.run()'.

git_config

(dict[str, str]): Git configurations used for cloning files. See the 'git_config' argument in 'sagemaker.processing.FrameworkProcessor.run()'.

inputs

(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).

outputs

(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).

arguments

(list[str]): A list of string arguments to be passed to a processing job (default: None).

job_name

(str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.

Returns

Returns a RunArgs object.


Method run()

Runs a processing job.

Usage
FrameworkProcessor$run(
  code,
  source_dir = NULL,
  dependencies = NULL,
  git_config = NULL,
  inputs = NULL,
  outputs = NULL,
  arguments = NULL,
  wait = TRUE,
  logs = TRUE,
  job_name = NULL,
  experiment_config = NULL,
  kms_key = NULL
)
Arguments
code

(str): This can be an S3 URI or a local path to a file with the framework script to run.Path (absolute or relative) to the local Python source file which should be executed as the entry point to training. When 'code' is an S3 URI, ignore 'source_dir', 'dependencies, and 'git_config'. If “source_dir“ is specified, then “code“ must point to a file located at the root of “source_dir“.

source_dir

(str): Path (absolute, relative or an S3 URI) to a directory with any other processing source code dependencies aside from the entry point file (default: None). If “source_dir“ is an S3 URI, it must point to a tar.gz file. Structure within this directory are preserved when processing on Amazon SageMaker (default: None).

dependencies

(list[str]): A list of paths to directories (absolute or relative) with any additional libraries that will be exported to the container (default: []). The library folders will be copied to SageMaker in the same folder where the entrypoint is copied. If 'git_config' is provided, 'dependencies' should be a list of relative locations to directories with any additional libraries needed in the Git repo (default: None).

git_config

(dict[str, str]): Git configurations used for cloning files, including “repo“, “branch“, “commit“, “2FA_enabled“, “username“, “password“ and “token“. The “repo“ field is required. All other fields are optional. “repo“ specifies the Git repository where your training script is stored. If you don't provide “branch“, the default value 'master' is used. If you don't provide “commit“, the latest commit in the specified branch is used. results in cloning the repo specified in 'repo', then checkout the 'master' branch, and checkout the specified commit. “2FA_enabled“, “username“, “password“ and “token“ are used for authentication. For GitHub (or other Git) accounts, set “2FA_enabled“ to 'True' if two-factor authentication is enabled for the account, otherwise set it to 'False'. If you do not provide a value for “2FA_enabled“, a default value of 'False' is used. CodeCommit does not support two-factor authentication, so do not provide "2FA_enabled" with CodeCommit repositories. For GitHub and other Git repos, when SSH URLs are provided, it doesn't matter whether 2FA is enabled or disabled; you should either have no passphrase for the SSH key pairs, or have the ssh-agent configured so that you will not be prompted for SSH passphrase when you do 'git clone' command with SSH URLs. When HTTPS URLs are provided: if 2FA is disabled, then either token or username+password will be used for authentication if provided (token prioritized); if 2FA is enabled, only token will be used for authentication if provided. If required authentication info is not provided, python SDK will try to use local credentials storage to authenticate. If that fails either, an error message will be thrown. For CodeCommit repos, 2FA is not supported, so '2FA_enabled' should not be provided. There is no token in CodeCommit, so 'token' should not be provided too. When 'repo' is an SSH URL, the requirements are the same as GitHub-like repos. When 'repo' is an HTTPS URL, username+password will be used for authentication if they are provided; otherwise, python SDK will try to use either CodeCommit credential helper or local credential storage for authentication.

inputs

(list[:class:'~sagemaker.processing.ProcessingInput']): Input files for the processing job. These must be provided as :class:'~sagemaker.processing.ProcessingInput' objects (default: None).

outputs

(list[:class:'~sagemaker.processing.ProcessingOutput']): Outputs for the processing job. These can be specified as either path strings or :class:'~sagemaker.processing.ProcessingOutput' objects (default: None).

arguments

(list[str]): A list of string arguments to be passed to a processing job (default: None).

wait

(bool): Whether the call should wait until the job completes (default: True).

logs

(bool): Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).

job_name

(str): Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.

experiment_config

(dict[str, str]): Experiment management configuration. Dictionary contains three optional keys: 'ExperimentName', 'TrialName', and 'TrialComponentDisplayName'.

kms_key

(str): The ARN of the KMS key that is used to encrypt the user code file (default: None).


Method clone()

The objects of this class are cloneable with this method.

Usage
FrameworkProcessor$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


DyfanJones/sagemaker-r-common documentation built on June 14, 2022, 10:31 p.m.