tf.distribute.Strategy
is a TensorFlow API to distribute training
across multiple GPUs, multiple machines or TPUs. Using this API, users
can distribute their existing models and training code with minimal code
changes. tf.distribute.Strategy
has been designed with these key goals
in mind:
tf.distribute.Strategy
can be used with TensorFlow’s high level APIs,
tf.keras
and tf.estimator
, with just a couple of lines of code
change. It also provides an API that can be used to distribute custom
training loops (and in general any computation using TensorFlow). In
TensorFlow 2.0, users can execute their programs eagerly, or in a graph
using tf.function
. tf.distribute.Strategy
intends to support both
these modes of execution. Note that we may talk about training most of
the time in this guide, but this API can also be used for distributing
evaluation and prediction on different platforms.
There are 5 strategies supported by keras API:
tf.distribute.MirroredStrategy
supports synchronous distributed
training on multiple GPUs on one machine. It creates one replica per GPU
device. Each variable in the model is mirrored across all the replicas.
Together, these variables form a single conceptual variable called
MirroredVariable. These variables are kept in sync with each other by
applying identical updates.
tf.distribute.experimental.CentralStorageStrategy
does synchronous
training as well. Variables are not mirrored, instead they are placed on
the CPU and operations are replicated across all local GPUs. If there is
only one GPU, all variables and operations will be placed on that GPU.
tf.distribute.experimental.MultiWorkerMirroredStrategy
is very similar
to MirroredStrategy. It implements synchronous distributed training
across multiple workers, each with potentially multiple GPUs. Similar to
MirroredStrategy, it creates copies of all variables in the model on
each device across all workers.
It uses CollectiveOps as the multi-worker all-reduce communication method used to keep variables in sync. A collective op is a single op in the TensorFlow graph which can automatically choose an all-reduce algorithm in the TensorFlow runtime according to hardware, network topology and tensor sizes.
tf.distribute.experimental.ParameterServerStrategy
supports parameter
servers training on multiple machines. In this setup, some machines are
designated as workers and some as parameter servers. Each variable of
the model is placed on one parameter server. Computation is replicated
across all GPUs of all the workers.
tf.distribute.experimental.TPUStrategy
lets users run their TensorFlow
training on Tensor Processing Units (TPUs). TPUs are Google’s
specialized ASICs designed to dramatically accelerate machine learning
workloads. They are available on Google Colab, the TensorFlow Research
Cloud and Google Compute Engine.
In terms of distributed training architecture, TPUStrategy is the same MirroredStrategy - it implements synchronous distributed training. TPUs provide their own implementation of efficient all-reduce and other collective operations across multiple TPU cores, which are used in TPUStrategy.
To start, we need to load keras and reticulate package.
library(keras)
library(reticulate)
For each strategy you have to create an object. I’ll use two of them to ilustrate.
tf <- import("tensorflow")
strategy <- list()
# Creating strategies
strategy$mirrored <- tf$compat$v1$distribute$MirroredStrategy()
strategy$central_storage <- tf$distribute$experimental$CentralStorageStrategy()
strategy$multiworker_mirrored <- tf$distribute$experimental$MultiWorkerMirroredStrategy()
strategy$parameter_server<- tf$distribute$experimental$ParameterServerStrategy()
Let’s run a model to see how it works. I’ll the well known mnist dataset as example.
# mnist dataset
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
To use a strategy in a model you just need to call the method scope()
using wtih
…
model <- keras_model_sequential()
with(strategy$central_storage$scope(), {
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')
})
…run!
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
history <- model %>% fit(
x_train, y_train,
epochs = 10, batch_size = 128,
validation_split = 0.2
)
To change the strategy, you just need to change the scope. For example
model <- keras_model_sequential()
with(strategy$parameter_server$scope(), {
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')
})
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.