Description Usage Arguments Details Examples
Serialize a Spark DataFrame to the TensorFlow TFRecord format for training or inference.
1 2 3 | spark_write_tfrecord(x, path, record_type = c("Example",
"SequenceExample"), write_locality = c("distributed", "local"),
mode = NULL)
|
x |
A Spark DataFrame |
path |
The path to the file. Needs to be accessible from the cluster. Supports the "hdfs://", "s3a://", and "file://" protocols. |
record_type |
Output format of TensorFlow records. One of |
write_locality |
Determines whether the TensorFlow records are
written locally on the workers or on a distributed file system. One of
|
mode |
A For more details see also http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes for your version of Spark. |
For write_locality = local
, each of the workers stores on the
local disk a subset of the data. The subset that is stored on each worker
is determined by the partitioning of the DataFrame. Each of the partitions
is coalesced into a single TFRecord file and written on the node where the
partition lives. This is useful in the context of distributed training, in which
each of the workers gets a subset of the data to work on. When this mode is
activated, the path provided to the writer is interpreted as a base path
that is created on each of the worker nodes, and that will be populated with data
from the DataFrame.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.