examples/kms_replication.md

Reproducing results with kerasformula

There are several sources of uncertainty when estimating a neural net with kerasformula. Optionally, kms uses R to split training and test data. Optionally, Python's numpy further splits the training data so that some can be used for validation, epoch-by-epoch. Finally, parallel processing or GPUs may introduce additional noise as batches are fed through. To reproduce results exactly, use the following syntax:

library(kerasformula)
movies <- read.csv("http://s3.amazonaws.com/dcwoods2717/movies.csv")

out <- kms(log10(gross/budget) ~ . -title, movies, scale="z",
           seed = list(seed = 12345, disable_gpu = TRUE, disable_parallel_cpu = TRUE))
___________________________________________________________________________
Layer (type)                     Output Shape                  Param #     
===========================================================================
dense_1 (Dense)                  (None, 256)                   355328      
___________________________________________________________________________
dropout_1 (Dropout)              (None, 256)                   0           
___________________________________________________________________________
dense_2 (Dense)                  (None, 128)                   32896       
___________________________________________________________________________
dropout_2 (Dropout)              (None, 128)                   0           
___________________________________________________________________________
dense_3 (Dense)                  (None, 1)                     129         
===========================================================================
Total params: 388,353
Trainable params: 388,353
Non-trainable params: 0
___________________________________________________________________________

We can confirm this works that worked as follows:

out2 <- kms(log10(gross/budget) ~ . -title, movies, scale="z",
           seed = list(seed = 12345, disable_gpu = TRUE, disable_parallel_cpu = TRUE))
___________________________________________________________________________
Layer (type)                     Output Shape                  Param #     
===========================================================================
dense_1 (Dense)                  (None, 256)                   355328      
___________________________________________________________________________
dropout_1 (Dropout)              (None, 256)                   0           
___________________________________________________________________________
dense_2 (Dense)                  (None, 128)                   32896       
___________________________________________________________________________
dropout_2 (Dropout)              (None, 128)                   0           
___________________________________________________________________________
dense_3 (Dense)                  (None, 1)                     129         
===========================================================================
Total params: 388,353
Trainable params: 388,353
Non-trainable params: 0
___________________________________________________________________________
out$MSE_predictions
[1] 0.6909273
out2$MSE_predictions
[1] 0.6909273
identical(out$y_test, out2$y_test)
[1] TRUE
identical(out$predictions, out2$predictions)
[1] TRUE

For other cases, to assess degree of convergence...

cor(out$predictions, out2$predictions)
     [,1]
[1,]    1
cor(out$predictions, out2$predictions, method="spearman")
     [,1]
[1,]    1
cor(out$predictions, out2$predictions, method="kendal") # typically last to converge
     [,1]
[1,]    1

or to visually inspect weights...

get_weights(out$model)       # not run
get_weights(out2$model)
summary(out$model)           # also printed before fitting unless verbose = 0

kms implements a wrapper for keras::use_session_with_seed, which should also be called before compiling a model that is to be passed as an argument to kms (for an example, see the bottom of the vignette). See also stack and tf docs. Thanks to @VladPerervenko for helpful suggestions on this topic (mistakes are of course all mine)!

This toy data set is also used to show how to build regression and classification models too.



rdrr1990/kerasformula documentation built on June 6, 2019, 8:02 a.m.