sagemaker_batch_replace_cluster_nodes: Replaces specific nodes within a SageMaker HyperPod cluster...
In paws.machine.learning: 'Amazon Web Services' Machine Learning Services

sagemaker_batch_replace_cluster_nodes

R Documentation

Replaces specific nodes within a SageMaker HyperPod cluster with new hardware

Description

Replaces specific nodes within a SageMaker HyperPod cluster with new hardware. batch_replace_cluster_nodes terminates the specified instances and provisions new replacement instances with the same configuration but fresh hardware. The Amazon Machine Image (AMI) and instance configuration remain the same.

See https://www.paws-r-sdk.com/docs/sagemaker_batch_replace_cluster_nodes/ for full documentation.

Usage

sagemaker_batch_replace_cluster_nodes(
  ClusterName,
  NodeIds = NULL,
  NodeLogicalIds = NULL
)

Arguments

ClusterName

[required] The name or Amazon Resource Name (ARN) of the SageMaker HyperPod cluster containing the nodes to replace.

NodeIds

A list of EC2 instance IDs to replace with new hardware. You can specify between 1 and 25 instance IDs.

Replace operations destroy all instance volumes (root and secondary). Ensure you have backed up any important data before proceeding.

Either NodeIds or NodeLogicalIds must be provided (or both), but at least one is required.
Each instance ID must follow the pattern ⁠i-⁠ followed by 17 hexadecimal characters (for example, ⁠i-0123456789abcdef0⁠).
For SageMaker HyperPod clusters using the Slurm workload manager, you cannot replace instances that are configured as Slurm controller nodes.

NodeLogicalIds

A list of logical node IDs to replace with new hardware. You can specify between 1 and 25 logical node IDs.

The NodeLogicalId is a unique identifier that persists throughout the node's lifecycle and can be used to track nodes that are still being provisioned and don't yet have an EC2 instance ID assigned.

Replace operations destroy all instance volumes (root and secondary). Ensure you have backed up any important data before proceeding.
This parameter is only supported for clusters using Continuous as the NodeProvisioningMode. For clusters using the default provisioning mode, use NodeIds instead.
Either NodeIds or NodeLogicalIds must be provided (or both), but at least one is required.