pretrain: Pre-trains the DeepBeliefNet or RestrictedBolzmannMachine
In xrobin/DeepLearning: Deep Learning of neural networks

Description Usage Arguments Format Value Pretraining Layers of the Deep Belief Net with Different Parameters Momentums Diagnostic specifications Progress Examples

View source: R/pretrain.R

A contrastive divergence method is used to train each layer sequentially.

pretrain(x, data, ...)

## S3 method for class 'RestrictedBolzmannMachine'
pretrain(x, data, miniters = 100,
  maxiters = floor(dim(data)[1]/batchsize), batchsize = 100,
  momentum = 0, penalization = c("l1", "l2", "none"), lambda = 0,
  lambda.b = lambda, lambda.c = lambda, lambda.W = lambda,
  epsilon = ifelse(x$output$type == "gaussian", 0.001, 0.1),
  epsilon.b = epsilon, epsilon.c = epsilon, epsilon.W = epsilon,
  train.b = TRUE, train.c = TRUE,
  continue.function = continue.function.exponential,
  continue.function.frequency = 1000, continue.stop.limit = 30,
  diag = list(rate = diag.rate, data = diag.data, f = diag.function),
  diag.rate = c("none", "each", "accelerate"), diag.data = NULL,
  diag.function = NULL, n.proc = detectCores() - 1, ...)

## S3 method for class 'DeepBeliefNet'
pretrain(x, data, miniters = 100,
  maxiters = floor(dim(data)[1]/batchsize), batchsize = 100,
  skip = numeric(0), momentum = 0, penalization = "l1",
  lambda = 2e-04, lambda.b = lambda, lambda.c = lambda,
  lambda.W = lambda, epsilon = 0.1, epsilon.b = epsilon,
  epsilon.c = epsilon, epsilon.W = epsilon, train.b = TRUE,
  train.c = length(x) - 1,
  continue.function = continue.function.exponential,
  continue.function.frequency = 100, continue.stop.limit = 3,
  diag = list(rate = diag.rate, data = diag.data, f = diag.function),
  diag.rate = c("none", "each", "accelerate"), diag.data = NULL,
  diag.function = NULL, n.proc = detectCores() - 1, ...)

pretrain.progress

`x`	the `DeepBeliefNet` or `RestrictedBolzmannMachine` object
`data`	the dataset, either as matrix or data.frame. The number of columns must match the number of nodes of the network input
`...`	ignored
`miniters, maxiters`	minimum and maximum number of iterations to perform
`batchsize`	the size of the minibatches
`momentum`	the momentum, between 0 (no momentum) and 1 (no training). See the Momentums section below.
`penalization`	the penalization mode. Either “l1” (sparse), “l2” (quadratic) or “none”.
`lambda`	penalty on large weights (weight-decay). Alternatively one can define `lambda.b`, `lambda.c` and `lambda.W` to constrain `b`s, `c`s and `W`s, respectively. Default: 0 = no penalization (equivalent to `penalization="none"`).
`lambda.b, lambda.c, lambda.W`	separate penalty rates for `b`s, `c`s and `W`s. Take precedence over `lambda`.
`epsilon`	learning rate. Alternatively one can define `epsilon.b`, `epsilon.c` and `epsilon.W` (see below) to learn `b`s, `c`s and `W`s, respectively, at different speeds. Defaut: 0.1 (for layers where all inputs and outputs are binary or continuous) or 0.001 (for layers with gaussian input or output).
`epsilon.b, epsilon.c, epsilon.W`	separate learning rates for `b`s, `c`s and `W`s. Take precedence over `epsilon`.
`train.b, train.c`	whether (`RestrictedBolzmannMachine`) or on which layers (`DeepBeliefNet`) to update the `b`s and `c`s. For a `RestrictedBolzmannMachine`, must be a logical of length 1. For a `DeepBeliefNet` must be a logical (can be recycled) or numeric index of layers.
`continue.function`	that can stop the pre-training between miniters and maxiters if it returns `FALSE`. By default, `continue.function.exponential` will be used. An alternative is to use `continue.function.always` that will always return true and thus carry on with the training until maxiters is reached. A user-supplied function must accept `(error, iter, batchsize)` as input and return a `logical` of length 1. The training is stopped when it returns `FALSE`.
`continue.function.frequency`	the frequency at which continue.function will be assessed.
`continue.stop.limit`	the number of consecutive times `continue.function` must return `FALSE` before the training is stopped. For example, `1` will stop as soon as `continue.function` returns `FALSE`, whereas `Inf` will ensure the result of `continue.function` is never enforced (but the function is still executed). The default is `3` so the training will continue until 3 consecutive calls of `continue.function` returned `FALSE`, giving more robustness to the decision.
`diag, diag.rate, diag.data, diag.function`	diagnostic specifications. See details.
`n.proc`	number of cores to be used for Eigen computations
`skip`	numeric vector of the RestrictedBolzmannMachine of the DeepBeliefNet to be skipped.

An object of class list of length 3.

pre-trained object with the pretrained switch set to TRUE.

It is possible to pre-train the layers of a DeepBeliefNet with different parameters. The following parameters can be supplied as vectors with length of the network - 1: batchsize, penalization, labmda, lambda.b, lambda.c, lambda.W, epsilon, epsilon.b, epsilon.c and epsilon.W. The values will be recycled if necessary (with essentially no warning if the lengths doesn't match). The special case of the momentum parameters is described below.

The momentum parameter can take several length, and will be interpreted accordingly:

1: constant momentum
2: a gradient, will be interpreted as seq(momentum[1], momentum[2], length.out=maxiters)
maxiter: encodes the momentum per iteration

To specify different momentums for the different layers of a DeepBeliefNet, they must be passed as a list of the same length than the number of RestrictedBolzmannMachines to pretrain, and they will be interpreted per layer as described above.

The specifications can be passed directly in a list with elements rate, data and f, or separately with parameters diag.rate, diag.data and diag.function. The function must be of the following form: function(rbm, batch, data, iter, batchsize, maxiters, layer)

rbm: the RBM object after the training iteration.
batch: the batch that was used at that iteration.
data: the data provided in diag.data or diag$data, possibly transformed through the previous layers of the DBN.
iter: the training iteration number, starting from 0 (before the first iteration).
batchsize: the size of the batch.
maxiters: the target number of iterations.
layer: the layer number, starting from 0.

The following diag.rate or diag$rate are supported:

“none”: the diag function will never be called.
“each”: the diag function will be called before the first iteration, and at the end of each iteration.
“accelerate”: the diag function will called before the first iteration, at the first 200 iterations, and then with a rate slowing down proportionally with the iteration number. It is always called at the last iteration.

Note that diag functions incur a slight overhead as they involve a callback to R and multiple object conversions. Setting diag.rate = "none" removes any overhead.

pretrain.progress is a convenient pre-built diagnostic specification that displays a progress bar per training layer.

library(mnist)
data(mnist)
# Initialize a 784-1000-500-250-30 layers DBN to process the MNIST data set
dbn.mnist <- DeepBeliefNet(
    Layers(c(784, 1000, 500, 250, 30), 
           input="continuous",
           output="gaussian"))
print(dbn.mnist)

## Not run: 
# Pre-train this DBN
pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x, 
								 penalization = "l2",
								 lambda=0.0002, 
								 epsilon=c(.1, .1, .1, .001), 
								 batchsize = 100,
								 maxiters=1000000)

## End(Not run)

## Not run: 
# Pretrain with a progress bar
# In this case the overhead is around 1%
diag <- list(rate = "accelerate", 
             data = NULL,
             f = function(rbm, batch, data, iter, batchsize, maxiters, layer) {
	if (iter == 0) {
		DBNprogressBar <<- txtProgressBar(min = 0, max = maxiters, initial = 0, 
		                                  width = NA, style = 3)
	}
	else if (iter == maxiters) {
		setTxtProgressBar(DBNprogressBar, iter)
		close(DBNprogressBar)
	}
	else {
		setTxtProgressBar(DBNprogressBar, iter)
	}
})
pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x,  penalization = "l2", lambda=0.0002,
                             epsilon=c(.1, .1, .1, .001), batchsize = 100, maxiters=1e4,
                             continue.function = continue.function.always, diag = diag)
# Equivalent to using pretrain.progress
pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x,  penalization = "l2", lambda=0.0002,
                             epsilon=c(.1, .1, .1, .001), batchsize = 100, maxiters=1e4,
                             continue.function = continue.function.always, diag = pretrain.progress)

## End(Not run)