DataGenerator: Wrap a loader function to a generator.

Description Usage Arguments Details Value

Description

DataGenerator creates a stateful function wrapper for the loader function as a workaround to load out-of-memory datasets in keras. The returned generator function can be passed to fit_generator() directly and loops over sample_ids indefinitely.

Usage

1
2
3
4
5
6
DataGenerator(loader, sample_ids, sample_cls = NULL, batch_size)

DataGeneratorNonblock(loader, sample_ids, sample_cls = NULL, batch_size)

DataGeneratorMultiProc(loader, sample_ids, sample_cls = NULL, batch_size,
  n_workers)

Arguments

loader

a loader function that accepts a subset of sample_ids and returns loaded data. See Details for more information.

sample_ids

a vector of ids passed to loader.

sample_cls

OPTIONAL, a vector of classes returned as y.

batch_size

an integer of batch size.

n_workers

number of workers in case of DataGeneratorMultiProc

Details

Since it can be tricky to implement a python-like iterator in R, using fit_generator() provided by package keras may be restricted to using the built-in generators only, DataGenerator is provided as a workaround but due to not loading all data into memory at once, it is not compatible with data augmentation provided by image_data_generator() at the moment. However, this can be implemented in loader function as a workaround.

A loader function should take a subset of sample_ids (which should be a vector of atomic type values) as the only argument and return loaded training data which is ready to be supplied as a batch of training data x. Since DataGenerator is transparent to data structure and it doesn't care about anything returned from loader() but collects and passes them directly, the loader function should take care of error processing, shape of returned data etc.

DataGeneratorNonblock is also provided for a somehow non-blocking (overhead and transfering a large chunk or memory can still be "blocking") version of generator. The non-blocking generator offloads the loader() job to a child process and fetches the result everytime generator() is called. Since it is implemented by calling mcparallel() and mccollect() from parallel package, this function is not supported in Windows system. You should also notice that the forked child process will stay in the background even training is over and no more loading is needed. In order to stop the child process and cleanup memory, you should call CleanupGenerator(generator), or generator(STOP = TRUE) to signal the child process and garbage collection.

DataGeneratorNonblock(...) is equivalent to DataGeneratorMultiProc(..., n_workers = 1)

Value

a generator function


imlijunda/AwkwardMLTools documentation built on May 13, 2019, 11:33 a.m.