TrainingInput: Create a definition for input data used by an SageMaker...

TrainingInputR Documentation

Create a definition for input data used by an SageMaker training job.

Description

Amazon SageMaker channel configurations for S3 data sources.

Public fields

config

A SageMaker “DataSource“ referencing a SageMaker “S3DataSource“.

Methods

Public methods


Method new()

See AWS documentation on the “CreateTrainingJob“ API for more details on the parameters.

Usage
TrainingInput$new(
  s3_data,
  distribution = NULL,
  compression = NULL,
  content_type = NULL,
  record_wrapping = NULL,
  s3_data_type = "S3Prefix",
  input_mode = NULL,
  attribute_names = NULL,
  target_attribute_name = NULL,
  shuffle_config = NULL
)
Arguments
s3_data

(str): Defines the location of s3 data to train on.

distribution

(str): Valid values: 'FullyReplicated', 'ShardedByS3Key' (default: 'FullyReplicated').

compression

(str): Valid values: 'Gzip', None (default: None). This is used only in Pipe input mode.

content_type

(str): MIME type of the input data (default: None).

record_wrapping

(str): Valid values: 'RecordIO' (default: None).

s3_data_type

(str): Valid values: 'S3Prefix', 'ManifestFile', 'AugmentedManifestFile'. If 'S3Prefix', “s3_data“ defines a prefix of s3 objects to train on. All objects with s3 keys beginning with “s3_data“ will be used to train. If 'ManifestFile' or 'AugmentedManifestFile', then “s3_data“ defines a single S3 manifest file or augmented manifest file (respectively), listing the S3 data to train on. Both the ManifestFile and AugmentedManifestFile formats are described in the SageMaker API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html

input_mode

(str): Optional override for this channel's input mode (default: None). By default, channels will use the input mode defined on “sagemaker.estimator.EstimatorBase.input_mode“, but they will ignore that setting if this parameter is set. * None - Amazon SageMaker will use the input mode specified in the “Estimator“ * 'File' - Amazon SageMaker copies the training dataset from the S3 location to a local directory. * 'Pipe' - Amazon SageMaker streams data directly from S3 to the container via a Unix-named pipe.

attribute_names

(list[str]): A list of one or more attribute names to use that are found in a specified AugmentedManifestFile.

target_attribute_name

(str): The name of the attribute will be predicted (classified) in a SageMaker AutoML job. It is required if the input is for SageMaker AutoML job.

shuffle_config

(ShuffleConfig): If specified this configuration enables shuffling on this channel. See the SageMaker API documentation for more info: https://docs.aws.amazon.com/sagemaker/latest/dg/API_ShuffleConfig.html


Method format()

format class

Usage
TrainingInput$format()

Method clone()

The objects of this class are cloneable with this method.

Usage
TrainingInput$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


DyfanJones/sagemaker-r-local documentation built on June 14, 2022, 10:32 p.m.