sagemaker_batch_replace_cluster_nodes: Replaces specific nodes within a SageMaker HyperPod cluster...

View source: R/sagemaker_operations.R

sagemaker_batch_replace_cluster_nodesR Documentation

Replaces specific nodes within a SageMaker HyperPod cluster with new hardware

Description

Replaces specific nodes within a SageMaker HyperPod cluster with new hardware. batch_replace_cluster_nodes terminates the specified instances and provisions new replacement instances with the same configuration but fresh hardware. The Amazon Machine Image (AMI) and instance configuration remain the same.

See https://www.paws-r-sdk.com/docs/sagemaker_batch_replace_cluster_nodes/ for full documentation.

Usage

sagemaker_batch_replace_cluster_nodes(
  ClusterName,
  NodeIds = NULL,
  NodeLogicalIds = NULL
)

Arguments

ClusterName

[required] The name or Amazon Resource Name (ARN) of the SageMaker HyperPod cluster containing the nodes to replace.

NodeIds

A list of EC2 instance IDs to replace with new hardware. You can specify between 1 and 25 instance IDs.

Replace operations destroy all instance volumes (root and secondary). Ensure you have backed up any important data before proceeding.

  • Either NodeIds or NodeLogicalIds must be provided (or both), but at least one is required.

  • Each instance ID must follow the pattern ⁠i-⁠ followed by 17 hexadecimal characters (for example, ⁠i-0123456789abcdef0⁠).

  • For SageMaker HyperPod clusters using the Slurm workload manager, you cannot replace instances that are configured as Slurm controller nodes.

NodeLogicalIds

A list of logical node IDs to replace with new hardware. You can specify between 1 and 25 logical node IDs.

The NodeLogicalId is a unique identifier that persists throughout the node's lifecycle and can be used to track nodes that are still being provisioned and don't yet have an EC2 instance ID assigned.

  • Replace operations destroy all instance volumes (root and secondary). Ensure you have backed up any important data before proceeding.

  • This parameter is only supported for clusters using Continuous as the NodeProvisioningMode. For clusters using the default provisioning mode, use NodeIds instead.

  • Either NodeIds or NodeLogicalIds must be provided (or both), but at least one is required.


paws.machine.learning documentation built on May 31, 2026, 1:07 a.m.