pretrain: Pre-trains the DeepBeliefNet or RestrictedBolzmannMachine

Description Usage Arguments Format Value Pretraining Layers of the Deep Belief Net with Different Parameters Momentums Diagnostic specifications Progress Examples

View source: R/pretrain.R

Description

A contrastive divergence method is used to train each layer sequentially.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
pretrain(x, data, ...)

## S3 method for class 'RestrictedBolzmannMachine'
pretrain(x, data, miniters = 100,
  maxiters = floor(dim(data)[1]/batchsize), batchsize = 100,
  momentum = 0, penalization = c("l1", "l2", "none"), lambda = 0,
  lambda.b = lambda, lambda.c = lambda, lambda.W = lambda,
  epsilon = ifelse(x$output$type == "gaussian", 0.001, 0.1),
  epsilon.b = epsilon, epsilon.c = epsilon, epsilon.W = epsilon,
  train.b = TRUE, train.c = TRUE,
  continue.function = continue.function.exponential,
  continue.function.frequency = 1000, continue.stop.limit = 30,
  diag = list(rate = diag.rate, data = diag.data, f = diag.function),
  diag.rate = c("none", "each", "accelerate"), diag.data = NULL,
  diag.function = NULL, n.proc = detectCores() - 1, ...)

## S3 method for class 'DeepBeliefNet'
pretrain(x, data, miniters = 100,
  maxiters = floor(dim(data)[1]/batchsize), batchsize = 100,
  skip = numeric(0), momentum = 0, penalization = "l1",
  lambda = 2e-04, lambda.b = lambda, lambda.c = lambda,
  lambda.W = lambda, epsilon = 0.1, epsilon.b = epsilon,
  epsilon.c = epsilon, epsilon.W = epsilon, train.b = TRUE,
  train.c = length(x) - 1,
  continue.function = continue.function.exponential,
  continue.function.frequency = 100, continue.stop.limit = 3,
  diag = list(rate = diag.rate, data = diag.data, f = diag.function),
  diag.rate = c("none", "each", "accelerate"), diag.data = NULL,
  diag.function = NULL, n.proc = detectCores() - 1, ...)

pretrain.progress

Arguments

x

the DeepBeliefNet or RestrictedBolzmannMachine object

data

the dataset, either as matrix or data.frame. The number of columns must match the number of nodes of the network input

...

ignored

miniters, maxiters

minimum and maximum number of iterations to perform

batchsize

the size of the minibatches

momentum

the momentum, between 0 (no momentum) and 1 (no training). See the Momentums section below.

penalization

the penalization mode. Either “l1” (sparse), “l2” (quadratic) or “none”.

lambda

penalty on large weights (weight-decay). Alternatively one can define lambda.b, lambda.c and lambda.W to constrain bs, cs and Ws, respectively. Default: 0 = no penalization (equivalent to penalization="none").

lambda.b, lambda.c, lambda.W

separate penalty rates for bs, cs and Ws. Take precedence over lambda.

epsilon

learning rate. Alternatively one can define epsilon.b, epsilon.c and epsilon.W (see below) to learn bs, cs and Ws, respectively, at different speeds. Defaut: 0.1 (for layers where all inputs and outputs are binary or continuous) or 0.001 (for layers with gaussian input or output).

epsilon.b, epsilon.c, epsilon.W

separate learning rates for bs, cs and Ws. Take precedence over epsilon.

train.b, train.c

whether (RestrictedBolzmannMachine) or on which layers (DeepBeliefNet) to update the bs and cs. For a RestrictedBolzmannMachine, must be a logical of length 1. For a DeepBeliefNet must be a logical (can be recycled) or numeric index of layers.

continue.function

that can stop the pre-training between miniters and maxiters if it returns FALSE. By default, continue.function.exponential will be used. An alternative is to use continue.function.always that will always return true and thus carry on with the training until maxiters is reached. A user-supplied function must accept (error, iter, batchsize) as input and return a logical of length 1. The training is stopped when it returns FALSE.

continue.function.frequency

the frequency at which continue.function will be assessed.

continue.stop.limit

the number of consecutive times continue.function must return FALSE before the training is stopped. For example, 1 will stop as soon as continue.function returns FALSE, whereas Inf will ensure the result of continue.function is never enforced (but the function is still executed). The default is 3 so the training will continue until 3 consecutive calls of continue.function returned FALSE, giving more robustness to the decision.

diag, diag.rate, diag.data, diag.function

diagnostic specifications. See details.

n.proc

number of cores to be used for Eigen computations

skip

numeric vector of the RestrictedBolzmannMachine of the DeepBeliefNet to be skipped.

Format

An object of class list of length 3.

Value

pre-trained object with the pretrained switch set to TRUE.

Pretraining Layers of the Deep Belief Net with Different Parameters

It is possible to pre-train the layers of a DeepBeliefNet with different parameters. The following parameters can be supplied as vectors with length of the network - 1: batchsize, penalization, labmda, lambda.b, lambda.c, lambda.W, epsilon, epsilon.b, epsilon.c and epsilon.W. The values will be recycled if necessary (with essentially no warning if the lengths doesn't match). The special case of the momentum parameters is described below.

Momentums

The momentum parameter can take several length, and will be interpreted accordingly:

To specify different momentums for the different layers of a DeepBeliefNet, they must be passed as a list of the same length than the number of RestrictedBolzmannMachines to pretrain, and they will be interpreted per layer as described above.

Diagnostic specifications

The specifications can be passed directly in a list with elements rate, data and f, or separately with parameters diag.rate, diag.data and diag.function. The function must be of the following form: function(rbm, batch, data, iter, batchsize, maxiters, layer)

The following diag.rate or diag$rate are supported:

Note that diag functions incur a slight overhead as they involve a callback to R and multiple object conversions. Setting diag.rate = "none" removes any overhead.

Progress

pretrain.progress is a convenient pre-built diagnostic specification that displays a progress bar per training layer.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
library(mnist)
data(mnist)
# Initialize a 784-1000-500-250-30 layers DBN to process the MNIST data set
dbn.mnist <- DeepBeliefNet(
    Layers(c(784, 1000, 500, 250, 30), 
           input="continuous",
           output="gaussian"))
print(dbn.mnist)

## Not run: 
# Pre-train this DBN
pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x, 
								 penalization = "l2",
								 lambda=0.0002, 
								 epsilon=c(.1, .1, .1, .001), 
								 batchsize = 100,
								 maxiters=1000000)

## End(Not run)

## Not run: 
# Pretrain with a progress bar
# In this case the overhead is around 1%
diag <- list(rate = "accelerate", 
             data = NULL,
             f = function(rbm, batch, data, iter, batchsize, maxiters, layer) {
	if (iter == 0) {
		DBNprogressBar <<- txtProgressBar(min = 0, max = maxiters, initial = 0, 
		                                  width = NA, style = 3)
	}
	else if (iter == maxiters) {
		setTxtProgressBar(DBNprogressBar, iter)
		close(DBNprogressBar)
	}
	else {
		setTxtProgressBar(DBNprogressBar, iter)
	}
})
pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x,  penalization = "l2", lambda=0.0002,
                             epsilon=c(.1, .1, .1, .001), batchsize = 100, maxiters=1e4,
                             continue.function = continue.function.always, diag = diag)
# Equivalent to using pretrain.progress
pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x,  penalization = "l2", lambda=0.0002,
                             epsilon=c(.1, .1, .1, .001), batchsize = 100, maxiters=1e4,
                             continue.function = continue.function.always, diag = pretrain.progress)

## End(Not run)

xrobin/DeepLearning documentation built on Sept. 18, 2020, 5:23 a.m.