Description Super class Methods
This rule helps to detect if GPU utilization is low or suffers from fluctuations. This is checked for each single GPU on each worker node. Rule returns True if 95th quantile is below threshold_p95 which indicates under-utilization. Rule returns true if 95th quantile is above threshold_p95 and 5th quantile is below threshold_p5 which indicates fluctuations.
sagemaker.debugger::ProfilerRuleBase
-> LowGPUUtilization
new()
Initialize LowGPUUtilization class
LowGPUUtilization$new( threshold_p95 = 70, threshold_p5 = 10, window = 500, patience = 1000, scan_interval_us = 60 * 1000 * 1000 )
threshold_p95
: threshold for 95th quantile below which GPU is considered to be underutilized. Default is 70 percent.
threshold_p5
: threshold for 5th quantile. Default is 10 percent.
window
: number of past datapoints which are used to compute the quantiles.
patience
: How many values to record before checking for underutilization/fluctuations. During training initilization, GPU is likely at 0 percent, so Rule should not check for underutilization immediately. Default 1000.
scan_interval_us
: interval with which timeline files are scanned. Default is 60000000 us.
clone()
The objects of this class are cloneable with this method.
LowGPUUtilization$clone(deep = FALSE)
deep
Whether to make a deep clone.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.