BatchSize: Debugger BatchSize class

Description Super class Methods

Description

This rule helps to detect if GPU is underulitized because of the batch size being too small. To detect this the rule analyzes the average GPU memory footprint, CPU and GPU utilization. If utilization on CPU, GPU and memory footprint is on average low , it may indicate that user can either run on a smaller instance type or that batch size could be increased. This analysis does not work for frameworks that heavily over-allocate memory. Increasing batch size could potentially lead to a processing/dataloading bottleneck, because more data needs to be pre-processed in each iteration.

Super class

sagemaker.debugger::ProfilerRuleBase -> BatchSize

Methods

Public methods

Inherited methods

Method new()

Initialize BatchSize class

Usage
BatchSize$new(
  cpu_threshold_p95 = 70,
  gpu_threshold_p95 = 70,
  gpu_memory_threshold_p95 = 70,
  patience = 1000,
  window = 500,
  scan_interval_us = 60 * 1000 * 1000
)
Arguments
cpu_threshold_p95

(numeric): defines the threshold for 95th quantile of CPU utilization.Default is 70%.

gpu_threshold_p95

(numeric): defines the threshold for 95th quantile of GPU utilization.Default is 70%.

gpu_memory_threshold_p95

(numeric): defines the threshold for 95th quantile of GPU memory utilization.Default is 70%.

patience

(numeric): defines how many data points to capture before Rule runs the first evluation. Default 100

window

(numeric): window size for computing quantiles.

scan_interval_us

(numeric): interval with which timeline files are scanned. Default is 60000000 us.


Method clone()

The objects of this class are cloneable with this method.

Usage
BatchSize$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


DyfanJones/sagemaker-r-debugger documentation built on Jan. 20, 2022, 5:49 p.m.