Object2Vec: A general-purpose neural embedding algorithm that is highly...

Object2VecR Documentation

A general-purpose neural embedding algorithm that is highly customizable.

Description

It can learn low-dimensional dense embeddings of high-dimensional objects. The embeddings are learned in a way that preserves the semantics of the relationship between pairs of objects in the original space in the embedding space.

Super classes

sagemaker.mlcore::EstimatorBase -> sagemaker.mlcore::AmazonAlgorithmEstimatorBase -> Object2Vec

Public fields

repo_name

sagemaker repo name for framework

repo_version

version of framework

MINI_BATCH_SIZE

The size of each mini-batch to use when training.

.module

mimic python module

Active bindings

epochs

Total number of epochs for SGD training

enc_dim

Dimension of the output of the embedding layer

mini_batch_size

mini batch size for SGD training

early_stopping_patience

The allowed number of consecutive epochs without improvement before early stopping is applied

early_stopping_tolerance

The value used to determine whether the algorithm has made improvement between two consecutive epochs for early stopping

dropout

Dropout probability on network layers

weight_decay

Weight decay parameter during optimization

bucket_width

The allowed difference between data sequence length when bucketing is enabled

num_classes

Number of classes for classification

mlp_layers

Number of MLP layers in the network

mlp_dim

Dimension of the output of MLP layer

mlp_activation

Type of activation function for the MLP layer

output_layer

Type of output layer

optimizer

Type of optimizer for training

learning_rate

Learning rate for SGD training

negative_sampling_rate

Negative sampling rate

comparator_list

Customization of comparator operator

tied_token_embedding_weight

Tying of token embedding layer weight

token_embedding_storage_type

Type of token embedding storage

enc0_network

Network model of encoder "enc0"

enc1_network

Network model of encoder "enc1"

enc0_cnn_filter_width

CNN filter width

enc1_cnn_filter_width

CNN filter width

enc0_max_seq_len

Maximum sequence length

enc1_max_seq_len

Maximum sequence length

enc0_token_embedding_dim

Output dimension of token embedding layer

enc1_token_embedding_dim

Output dimension of token embedding layer

enc0_vocab_size

Vocabulary size of tokens

enc1_vocab_size

Vocabulary size of tokens

enc0_layers

Number of layers in encoder

enc1_layers

Number of layers in encoder

enc0_freeze_pretrained_embedding

Freeze pretrained embedding weights

enc1_freeze_pretrained_embedding

Freeze pretrained embedding weights

Methods

Public methods

Inherited methods

Method new()

Object2Vec is :class:'Estimator' used for anomaly detection. This Estimator may be fit via calls to :meth:'~sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase.fit'. There is an utility :meth:'~sagemaker.amazon.amazon_estimator.AmazonAlgorithmEstimatorBase.record_set' that can be used to upload data to S3 and creates :class:'~sagemaker.amazon.amazon_estimator.RecordSet' to be passed to the 'fit' call. After this Estimator is fit, model data is stored in S3. The model may be deployed to an Amazon SageMaker Endpoint by invoking :meth:'~sagemaker.amazon.estimator.EstimatorBase.deploy'. As well as deploying an Endpoint, deploy returns a :class:'~sagemaker.amazon.Predictor' object that can be used for inference calls using the trained model hosted in the SageMaker Endpoint. Object2Vec Estimators can be configured by setting hyperparameters. The available hyperparameters for Object2Vec are documented below. For further information on the AWS Object2Vec algorithm, please consult AWS technical documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/object2vec.html

Usage
Object2Vec$new(
  role,
  instance_count,
  instance_type,
  epochs,
  enc0_max_seq_len,
  enc0_vocab_size,
  enc_dim = NULL,
  mini_batch_size = NULL,
  early_stopping_patience = NULL,
  early_stopping_tolerance = NULL,
  dropout = NULL,
  weight_decay = NULL,
  bucket_width = NULL,
  num_classes = NULL,
  mlp_layers = NULL,
  mlp_dim = NULL,
  mlp_activation = NULL,
  output_layer = NULL,
  optimizer = NULL,
  learning_rate = NULL,
  negative_sampling_rate = NULL,
  comparator_list = NULL,
  tied_token_embedding_weight = NULL,
  token_embedding_storage_type = NULL,
  enc0_network = NULL,
  enc1_network = NULL,
  enc0_cnn_filter_width = NULL,
  enc1_cnn_filter_width = NULL,
  enc1_max_seq_len = NULL,
  enc0_token_embedding_dim = NULL,
  enc1_token_embedding_dim = NULL,
  enc1_vocab_size = NULL,
  enc0_layers = NULL,
  enc1_layers = NULL,
  enc0_freeze_pretrained_embedding = NULL,
  enc1_freeze_pretrained_embedding = NULL,
  ...
)
Arguments
role

(str): An AWS IAM role (either name or full ARN). The Amazon SageMaker training jobs and APIs that create Amazon SageMaker endpoints use this role to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role, if accessing AWS resource.

instance_count

(int): Number of Amazon EC2 instances to use for training.

instance_type

(str): Type of EC2 instance to use for training, for example, 'ml.c4.xlarge'.

epochs

(int): Total number of epochs for SGD training

enc0_max_seq_len

(int): Maximum sequence length

enc0_vocab_size

(int): Vocabulary size of tokens

enc_dim

(int): Optional. Dimension of the output of the embedding layer

mini_batch_size

(int): Optional. mini batch size for SGD training

early_stopping_patience

(int): Optional. The allowed number of consecutive epochs without improvement before early stopping is applied

early_stopping_tolerance

(float): Optional. The value used to determine whether the algorithm has made improvement between two consecutive epochs for early stopping

dropout

(float): Optional. Dropout probability on network layers

weight_decay

(float): Optional. Weight decay parameter during optimization

bucket_width

(int): Optional. The allowed difference between data sequence length when bucketing is enabled

num_classes

(int): Optional. Number of classes for classification

mlp_layers

(int): Optional. Number of MLP layers in the network

mlp_dim

(int): Optional. Dimension of the output of MLP layer

mlp_activation

(str): Optional. Type of activation function for the MLP layer

output_layer

(str): Optional. Type of output layer

optimizer

(str): Optional. Type of optimizer for training

learning_rate

(float): Optional. Learning rate for SGD training

negative_sampling_rate

(int): Optional. Negative sampling rate

comparator_list

(str): Optional. Customization of comparator operator

tied_token_embedding_weight

(bool): Optional. Tying of token embedding layer weight

token_embedding_storage_type

(str): Optional. Type of token embedding storage

enc0_network

(str): Optional. Network model of encoder "enc0"

enc1_network

(str): Optional. Network model of encoder "enc1"

enc0_cnn_filter_width

(int): Optional. CNN filter width

enc1_cnn_filter_width

(int): Optional. CNN filter width

enc1_max_seq_len

(int): Optional. Maximum sequence length

enc0_token_embedding_dim

(int): Optional. Output dimension of token embedding layer

enc1_token_embedding_dim

(int): Optional. Output dimension of token embedding layer

enc1_vocab_size

(int): Optional. Vocabulary size of tokens

enc0_layers

(int): Optional. Number of layers in encoder

enc1_layers

(int): Optional. Number of layers in encoder

enc0_freeze_pretrained_embedding

(bool): Optional. Freeze pretrained embedding weights

enc1_freeze_pretrained_embedding

(bool): Optional. Freeze pretrained embedding weights

...

: base class keyword argument values.

training

(ignored for regression problems)


Method create_model()

Return a :class:'~sagemaker.amazon.Object2VecModel' referencing the latest s3 model data produced by this Estimator.

Usage
Object2Vec$create_model(vpc_config_override = "VPC_CONFIG_DEFAULT", ...)
Arguments
vpc_config_override

(dict[str, list[str]]): Optional override for VpcConfig set on the model. Default: use subnets and security groups from this Estimator. * 'Subnets' (list[str]): List of subnet ids. * 'SecurityGroupIds' (list[str]): List of security group ids.

...

: Additional kwargs passed to the Object2VecModel constructor.


Method .prepare_for_training()

Set hyperparameters needed for training. This method will also validate “source_dir“.

Usage
Object2Vec$.prepare_for_training(
  records,
  mini_batch_size = NULL,
  job_name = NULL
)
Arguments
records

(RecordSet) – The records to train this Estimator on.

mini_batch_size

(int or None) – The size of each mini-batch to use when training. If None, a default value will be used.

job_name

(str): Name of the training job to be created. If not specified, one is generated, using the base name given to the constructor if applicable.


Method clone()

The objects of this class are cloneable with this method.

Usage
Object2Vec$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


DyfanJones/sagemaker-r-mlframework documentation built on March 18, 2022, 7:41 a.m.