Description Usage Arguments Details
Serialize a Spark DataFrame to the TensorFlow TFRecord format for training or inference.
1 2 3 | spark_write_tfrecord(x, path, record_type = c("Example",
"SequenceExample"), write_locality = c("distributed", "local"),
mode = NULL)
|
x |
A Spark DataFrame |
path |
The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://", and "file://" protocols. |
record_type |
Output format of TensorFlow records. One of |
write_locality |
Determines whether the TensorFlow records are
written locally on the workers or on a distributed file system. One of
|
mode |
A For more details see also http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes for your version of Spark. |
For write_locality = local
, each of the workers stores on the
local disk a subset of the data. The subset that is stored on each worker
is determined by the partitioning of the DataFrame. Each of the partitions
is coalesced into a single TFRecord file and written on the node where the
partition lives. This is useful in the context of distributed training, in which
each of the workers gets a subset of the data to work on. When this mode is
activated, the path provided to the writer is interpreted as a base path
that is created on each of the worker nodes, and that will be populated with data
from the DataFrame.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.