Nina Zumel had a really great article on how to prepare a nice Keras
performance plot using R
.
I will use this example to show some of the advantages of cdata
record transform specifications.
The model performance data from Keras
is in the following format:
# R code library(wrapr) df <- wrapr::build_frame( "val_loss" , "val_acc", "loss" , "acc" , "epoch" | -0.377 , 0.8722 , -0.5067, 0.7852, 1 | -0.2997 , 0.8895 , -0.3002, 0.904 , 2 | -0.2964 , 0.8822 , -0.2166, 0.9303, 3 | -0.2779 , 0.8899 , -0.1739, 0.9428, 4 | -0.2843 , 0.8861 , -0.1411, 0.9545, 5 | -0.312 , 0.8817 , -0.1136, 0.9656, 6 ) knitr::kable(df[1, , drop = FALSE])
And the form that would be easiest to use with ggplot2
would be the following:
# R code pf <- wrapr::build_frame( "epoch" , "measure" , "training", "validation" | 1 , "minus binary cross entropy", -0.5067 , -0.377 | 1 , "accuracy" , 0.7852 , 0.8722 ) knitr::kable(pf)
In her article Nina Zumel shows a cdata
transform solution which we re-interpret as the following:
# R code library(cdata) # define the record shape we want by example controlTable <- wrapr::qchar_frame( "measure" , "training", "validation" | "minus binary cross entropy", loss , val_loss | "accuracy" , acc , val_acc ) # use our example record shape to specify the record transform transform <- rowrecs_to_blocks_spec( controlTable = controlTable, recordKeys = 'epoch') df %.>% transform
We have a tutorial on how to design such transforms by writing down the shape your incoming data records are arranged in, and also the shape you wish your outgoing data records to be arranged in.
This simple data transform is in fact not a single pivot/un-pivot, as the result records spread data-values over multiple rows and multiple columns at the same time. We call the transform simple, because from a user point of view: it takes records of one form to another form (with the details left to the implementation).
In this note I would like to comment on some of the great advantages of using a data driven record transform specification.
First, as we see above and in the tutorial, once learned the specification system is very easy (and powerful).
Next: we can print the transformation and check if it matches our intent:
# R code print(transform)
The important point is that the transform is specified in data (not code):
# R code str(transform)
Because the transform is data (not code), it is easy to share with other systems: such as SQL
or Python
/Pandas
.
To show this we will first convert the transform specification into YAML
for transport.
# R code library(yaml) yaml_str <- transform %.>% convert_cdata_spec_to_yaml(.) %.>% yaml::as.yaml(.) cat(yaml_str)
We can then import this structure into Python
.
# R code library(reticulate) use_condaenv("aiAcademy") # our Python environment, yours will be different
The transported operator can then be used in Python.
# Python code import yaml import pandas import data_algebra from data_algebra.cdata_impl import record_map_from_simple_obj record_map = record_map_from_simple_obj(yaml.safe_load(r.yaml_str)) print(record_map)
# Python code print(r.df)
# Python code res = record_map.transform(r.df) print(res)
We can even convert the transform to SQL
(either in R
directly or in Python
directly).
# Python code from data_algebra.SQLite import SQLiteModel from data_algebra.data_ops import * db_model = SQLiteModel() ops = TableDescription( 'keras_frame', ["val_loss", "val_acc", "loss", "acc", "epoch"]). \ convert_records(record_map) print(ops.to_python(pretty=True))
# Python code sql_str = ops.to_sql(db_model, pretty=True) print(sql_str)
The SQL
code was generated from the transform specification. This was easy to implement as it is often simple to convert data to code (though it can be quite hard to translate code to data).
And that is some of the power of using data to specify your data transforms.
More on cross-language data processing can be found here, here, and here.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.