R/pretrain.R

Defines functions make.momentum pretrain.DeepBeliefNet pretrain.RestrictedBolzmannMachine pretrain

Documented in pretrain pretrain.DeepBeliefNet pretrain.RestrictedBolzmannMachine

#' @title Pre-trains the DeepBeliefNet or RestrictedBolzmannMachine
#' @description A contrastive divergence method is used to train each layer sequentially.
#' @param x the \code{\link{DeepBeliefNet}} or \code{\link{RestrictedBolzmannMachine}} object
#' @param data the dataset, either as matrix or data.frame. The number of columns must match the number of nodes of the network input
#' @param miniters,maxiters minimum and maximum number of iterations to perform
#' @param batchsize the size of the minibatches
#' @param skip numeric vector of the RestrictedBolzmannMachine of the DeepBeliefNet to be skipped.
#' @param momentum the momentum, between 0 (no momentum) and 1 (no training). See the Momentums section below.
#' @param penalization the penalization mode. Either \dQuote{l1} (sparse), \dQuote{l2} (quadratic) or \dQuote{none}.
#' @param lambda penalty on large weights (weight-decay). Alternatively one can define \code{lambda.b}, \code{lambda.c} and \code{lambda.W} to constrain 
#' \code{b}s, \code{c}s and \code{W}s, respectively. Default: 0 = no penalization (equivalent to \code{penalization="none"}).
#' @param lambda.b,lambda.c,lambda.W separate penalty rates for \code{b}s, \code{c}s and \code{W}s. Take precedence over \code{lambda}.
#' @param epsilon learning rate. Alternatively one can define \code{epsilon.b}, \code{epsilon.c} and \code{epsilon.W} (see below)
#' to learn \code{b}s, \code{c}s and \code{W}s, respectively, at different speeds. Defaut: 0.1 (for layers where all inputs and outputs are binary or continuous)
#'  or 0.001 (for layers with gaussian input or output).
#' @param epsilon.b,epsilon.c,epsilon.W separate learning rates for \code{b}s, \code{c}s and \code{W}s. Take precedence over \code{epsilon}.
#' @param train.b,train.c whether (\code{\link{RestrictedBolzmannMachine}}) or on which layers (\code{\link{DeepBeliefNet}}) to update the \code{b}s and \code{c}s. For a \code{\link{RestrictedBolzmannMachine}}, must be a logical of length 1. For a \code{\link{DeepBeliefNet}} must be a logical (can be recycled) or numeric index of layers.
#' @param continue.function that can stop the pre-training between miniters and maxiters if it returns \code{FALSE}. 
#' By default, \code{\link{continue.function.exponential}} will be used. An alternative is to use \code{\link{continue.function.always}} that will always return true and thus carry on with the training until maxiters is reached.
#' A user-supplied function must accept \code{(error, iter, batchsize)} as input and return a \code{\link{logical}} of length 1. The training is stopped when it returns \code{FALSE}.
#' @param continue.function.frequency the frequency at which continue.function will be assessed.
#' @param continue.stop.limit the number of consecutive times \code{continue.function} must return \code{FALSE} before the training is stopped. For example, \code{1} will stop as soon as \code{continue.function} returns \code{FALSE}, whereas \code{Inf} will ensure the result of \code{continue.function} is never enforced (but the function is still executed). The default is \code{3} so the training will continue until 3 consecutive calls of \code{continue.function} returned \code{FALSE}, giving more robustness to the decision.
#' @param diag,diag.rate,diag.data,diag.function diagnostic specifications. See details.
#' @param n.proc number of cores to be used for Eigen computations
#' @param ... ignored
#' @section Pretraining Layers of the Deep Belief Net with Different Parameters:
#' It is possible to pre-train the layers of a DeepBeliefNet with different parameters. The following parameters can be supplied as vectors with length of the network - 1:
#' \code{batchsize}, \code{penalization}, \code{labmda}, \code{lambda.b}, \code{lambda.c}, \code{lambda.W}, 
#' \code{epsilon}, \code{epsilon.b}, \code{epsilon.c} and \code{epsilon.W}.
#' The values will be recycled if necessary (with essentially no warning if the lengths doesn't match). The special case of the \code{momentum} parameters is described below.
#' @section Momentums:
#'  The \code{momentum} parameter can take several length, and will be interpreted accordingly:
#' \itemize{
#' \item \code{1}: constant momentum
#' \item \code{2}: a gradient, will be interpreted as seq(momentum[1], momentum[2], length.out=maxiters)
#' \item \code{maxiter}: encodes the momentum per iteration
#' }
#' To specify different \code{momentum}s for the different layers of a DeepBeliefNet, they must be passed as a \code{\link{list}} of the same length than the number
#' of RestrictedBolzmannMachines to pretrain,
#' and they will be interpreted per layer as described above.
#' 
#' @section Diagnostic specifications:
#' The specifications can be passed directly in a list with elements \code{rate}, \code{data} and \code{f}, or separately with parameters \code{diag.rate}, \code{diag.data} and \code{diag.function}. The function must be of the following form:
#' \code{function(rbm, batch, data, iter, batchsize, maxiters, layer)}
#' \itemize{
#' \item \code{rbm}: the RBM object after the training iteration.
#' \item \code{batch}: the batch that was used at that iteration.
#' \item \code{data}: the data provided in \code{diag.data} or \code{diag$data}, possibly transformed through the previous layers of the DBN.
#' \item \code{iter}: the training iteration number, starting from 0 (before the first iteration).
#' \item \code{batchsize}: the size of the batch.
#' \item \code{maxiters}: the target number of iterations.
#' \item \code{layer}: the layer number, starting from 0.
#' }
#' 
#' The following \code{diag.rate} or \code{diag$rate} are supported:
#' \itemize{
#' \item \dQuote{none}: the diag function will never be called.
#' \item \dQuote{each}: the diag function will be called before the first iteration, and at the end of each iteration.
#' \item \dQuote{accelerate}: the diag function will called before the first iteration, at the first 200 iterations, and then with a rate slowing down proportionally with the iteration number. It is always called at the last iteration.
#' }
#' 
#' Note that diag functions incur a slight overhead as they involve a callback to R and multiple object conversions. Setting \code{diag.rate = "none"} removes any overhead.
#' 
#' @section Progress:
#' \code{pretrain.progress} is a convenient pre-built diagnostic specification that displays a progress bar per training layer.
#'  
#' @return pre-trained object with the \code{pretrained} switch set to \code{TRUE}.
#' @examples 
#' library(mnist)
#' data(mnist)
# '
#' # Initialize a 784-1000-500-250-30 layers DBN to process the MNIST data set
#' dbn.mnist <- DeepBeliefNet(
#'     Layers(c(784, 1000, 500, 250, 30), 
#'            input="continuous",
#'            output="gaussian"))
#' print(dbn.mnist)
#' 
#' \dontrun{
#' # Pre-train this DBN
#' pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x, 
#' 								 penalization = "l2",
#' 								 lambda=0.0002, 
#' 								 epsilon=c(.1, .1, .1, .001), 
#' 								 batchsize = 100,
#' 								 maxiters=1000000)
#' }
#' 
#' \dontrun{
#' # Pretrain with a progress bar
#' # In this case the overhead is around 1%
#' diag <- list(rate = "accelerate", 
#'              data = NULL,
#'              f = function(rbm, batch, data, iter, batchsize, maxiters, layer) {
#' 	if (iter == 0) {
#' 		DBNprogressBar <<- txtProgressBar(min = 0, max = maxiters, initial = 0, 
#' 		                                  width = NA, style = 3)
#' 	}
#' 	else if (iter == maxiters) {
#' 		setTxtProgressBar(DBNprogressBar, iter)
#' 		close(DBNprogressBar)
#' 	}
#' 	else {
#' 		setTxtProgressBar(DBNprogressBar, iter)
#' 	}
#' })
#' pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x,  penalization = "l2", lambda=0.0002,
#'                              epsilon=c(.1, .1, .1, .001), batchsize = 100, maxiters=1e4,
#'                              continue.function = continue.function.always, diag = diag)
#' # Equivalent to using pretrain.progress
#' pretrained.mnist <- pretrain(dbn.mnist, mnist$train$x,  penalization = "l2", lambda=0.0002,
#'                              epsilon=c(.1, .1, .1, .001), batchsize = 100, maxiters=1e4,
#'                              continue.function = continue.function.always, diag = pretrain.progress)
#' }
#' @export
pretrain <- function(x, data, ...)
	UseMethod("pretrain", x)


#' @rdname pretrain
#' @export
pretrain.RestrictedBolzmannMachine <- function(x, data, miniters = 100, maxiters = floor(dim(data)[1] / batchsize), batchsize = 100, 
						 momentum = 0, penalization = c("l1", "l2", "none"),
						 lambda = 0, lambda.b = lambda, lambda.c = lambda, lambda.W = lambda,
						 epsilon = ifelse(x$output$type == "gaussian", 0.001, 0.1), epsilon.b = epsilon, epsilon.c = epsilon, epsilon.W = epsilon,
						 train.b = TRUE, train.c = TRUE,
						 continue.function = continue.function.exponential, continue.function.frequency = 1000, continue.stop.limit = 30,
						 diag = list(rate = diag.rate, data = diag.data, f = diag.function), diag.rate = c("none", "each", "accelerate"), diag.data = NULL, diag.function = NULL,
						 n.proc = detectCores() - 1, ...) {
	sample.size <- nrow(data)
	
	# Check for ignored arguments
	ignored.args <- names(list(...))
	if (length(ignored.args) > 0) {
		warning(paste("The following arguments were ignored in pretrain.RestrictedBolzmannMachine:", paste(ignored.args, collapse=", ")))
	}
	
	# Validate and prepare the momentums, learning rates and penalizations
	momentum <- make.momentum(momentum, maxiters)
	if (is.null(lambda.b))
		lambda.b <- 0
	if (is.null(lambda.c))
		lambda.c <- 0
	if (is.null(lambda.W))
		lambda.W <- 0
	if (is.null(epsilon.b))
		epsilon.b <- ifelse(x$output$type == "gaussian", 0.001, 0.1)
	if (is.null(epsilon.c))
		epsilon.c <- ifelse(x$output$type == "gaussian", 0.001, 0.1)
	if (is.null(epsilon.W))
		epsilon.W <- ifelse(x$output$type == "gaussian", 0.001, 0.1)
	
	penalization <- match.arg(penalization)
	
	# Build diagnostic function
	if (missing(diag) && is.null(diag.data) && is.null(diag.function)) {
		diag$rate <- "none"
	}
	else {
		diag$rate <- match.arg(diag$rate, c("none", "each", "accelerate"))
	}
	
	# Build continue function
	continue.function <- list(
		continue.function = continue.function,
		continue.function.frequency = continue.function.frequency,
		continue.stop.limit = continue.stop.limit
	)

	ensure.data.validity(data, x$input)

	pretrainParams <- list(
		maxiters = maxiters, miniters = miniters, batchsize = batchsize,
		momentum = momentum, penalization = penalization,
		lambda.b = lambda.b, lambda.c = lambda.c, lambda.W = lambda.W,
		epsilon.b = epsilon.b, epsilon.c = epsilon.c, epsilon.W = epsilon.W,
		train.b = train.b, train.c = train.c,
		n.proc = n.proc)
	ret <- pretrainRbmCpp(x, data, pretrainParams, diag, continue.function)

# Below is a block of legacy pre-c++ code that we can probably safely remove.
# 		# Execute the diag function
# 		# Random choice of data points to start with. Sample with replacement when maxiters > sample.size
# 		s <- matrix(sample(1:sample.size, size=maxiters*batchsize, replace = maxiters*batchsize > sample.size), nrow = maxiters)
# 
# 		# Prepare the weight increments for momentums
# 		x$Winc <- x$W - x$W
# 		x$bInc <- x$b - x$b
# 		x$cInc <- x$c - x$c
# 		
# 		# Actually do the training
# 		for (i in 1:(maxiters)) {
# 			batch <- s[i,]
# 			#batch <- s[(((i-1)*batchsize)+1):(i*batchsize)]
# 			x <- rbm.update.batch(x, data[batch, ], momentum = momentum[i], penalization = penalization,
# 								  lambda.b = lambda, lambda.c = lambda, lambda.W = lambda,
# 								  epsilon.b = epsilon, epsilon.c = epsilon, epsilon.W = epsilon)
# 			if (is.function(diag.function)) {
# 				ll <- list(x = x, batch = i, step = "pretrain")
# 				do.call(diag.function, c(ll, diag.args))
# 			}
# 		}
# 		
# 		# Remove the momentum weights
# 		x$Winc <- NULL
# 		x$bInc <- NULL
# 		x$cInc <- NULL
#
#	# Return the pretrained object
#	x$pretrained <- TRUE

	return(ret)
}

#' @rdname pretrain
#' @export
pretrain.DeepBeliefNet <- function(x, data, 
						 # Arguments for rbm.pretrain(_with_diagnostics)
						 miniters = 100, maxiters = floor(dim(data)[1] / batchsize), batchsize = 100,
						 skip = numeric(0),
						 # Arguments for rbm.update
						 momentum = 0, penalization = "l1", 
						 lambda = 0.0002, lambda.b = lambda, lambda.c = lambda, lambda.W = lambda,
						 epsilon = 0.1, epsilon.b = epsilon, epsilon.c = epsilon, epsilon.W = epsilon,
						 train.b = TRUE, train.c = length(x) - 1,
						 continue.function = continue.function.exponential, continue.function.frequency = 100, continue.stop.limit = 3,
						 diag = list(rate = diag.rate, data = diag.data, f = diag.function), diag.rate = c("none", "each", "accelerate"), diag.data = NULL, diag.function = NULL,
						 n.proc = detectCores() - 1,
						 ...) {
	sample.size <- dim(data)[1]
	
	# Check for ignored arguments
	ignored.args <- names(list(...))
	if (length(ignored.args) > 0) {
		warning(paste("The following arguments were ignored in pretrain.DeepBeliefNet:", paste(ignored.args, collapse=", ")))
	}
	
	ensure.data.validity(data, x[[1]]$input)
	
	# What layers to train?
	train.layers <- seq_along(x$rbms)
	if (length(skip) > 0) {
		skip <- as.integer(skip)
		# Ensure skipping only layers that exist
		if (!all(skipped <- skip %in% train.layers)) {
			stop(paste("Invalid skip values:", paste(skip[!skipped], collapse=", ")))
		}
		train.layers <- train.layers[-skip]
	}
	
	# Make arguments of length #layers to train
	len <- length(train.layers)
	
	# Make sure penalization is a character, not a factor or numeric:
	penalization <- as.character(penalization)
	
	# Fix expilon - default by layer type
	#if (is.null(epsilon)) {
	#	gaussian.layers <- sapply(x$rbms, function(rbm) {return(rbm$input$type == "gaussian" || rbm$output$type == "gaussian")})
	#	epsilon <- ifelse(gaussian.layers, 0.001, 0.1) # This automatically sets epsilon.b .c and .W if they were missing!
	#}
	
	if (is.numeric(train.b)) {
		train.b <- seq_len(length(x) - 1) %in% train.b
	}
	if (is.numeric(train.c)) {
		train.c <- seq_len(length(x) - 1) %in% train.c
	}
	
	parameters <- data.frame(
		miniters = rep(miniters, length.out = len),
		maxiters = rep(maxiters, length.out = len),
		batchsize = rep(batchsize, length.out = len),
		penalization = rep(sapply(penalization, match.arg, choices = c("l1", "l2", "none"), several.ok=TRUE), length.out = len),
		lambda.b = rep(lambda.b, length.out = len),
		lambda.c = rep(lambda.c, length.out = len),
		lambda.W = rep(lambda.W, length.out = len),
		epsilon.b = rep(epsilon.b, length.out = len),
		epsilon.c = rep(epsilon.c, length.out = len),
		epsilon.W = rep(epsilon.W, length.out = len),
		train.b = rep(train.b, length.out = len),
		train.c = rep(train.c, length.out = len),
		stringsAsFactors = FALSE
	)
	
	# Build diagnostic function
	if (missing(diag) && is.null(diag.data) && is.null(diag.function)) {
		diag$rate <- "none"
	}
	else {
		diag$rate <- match.arg(diag$rate, c("none", "each", "accelerate"))
	}
	
	# Build continue function
	continue.function <- list(
		continue.function = continue.function,
		continue.function.frequency = continue.function.frequency,
		continue.stop.limit = continue.stop.limit
	)
	
	if (is.list(momentum)) {
		if (length(momentum) != len) {
			stop("If 'momentum' is be a list, it must be of length equal to the number of RestrictedBolzmannMachine to train in the network.")
		}
		for (i in seq_along(momentum)) {
			momentum[[i]] <- make.momentum(momentum[[i]], parameters$maxiters[i])
		}
		parameters$momentum <- momentum
	}
	else {
		parameters$momentum <- sapply(len, function(i) make.momentum(momentum, maxiters), simplify=FALSE)
	}
	
	if (length(skip) > 0) {
		# If we skip layers, we'll have too few parameters - put them in the right place!
		parameters[train.layers,] <- parameters
		rownames(parameters) <- seq_along(x$rbms)
	}

	parameters <- split(parameters, rownames(parameters))
	
	pretrained <- pretrainDbnCpp(x, data, parameters, diag, continue.function, skip)

	return(pretrained)
}

make.momentum <- function(momentum, maxiters) {
	if (length(momentum) == 1)
		return(rep(momentum, maxiters))
	else if (length(momentum) == 2)
		return(seq(momentum[1], momentum[2], length.out=maxiters))
	else if (length(momentum) == maxiters)
		return(momentum)
	else
		stop("momentum must be of length 1, 2 or maxiters")
}

#' @rdname pretrain
#' @export
pretrain.progress <- list(rate = "accelerate", data = NULL, f = function(rbm, batch, data, iter, batchsize, maxiters, layer) {
	if (iter == 0) {
		DBNprogressBar <<- txtProgressBar(min = 0, max = maxiters, initial = 0, width = NA, style = 3)
	}
	else if (iter == maxiters) {
		setTxtProgressBar(DBNprogressBar, iter)
		close(DBNprogressBar)
	}
	else {
		setTxtProgressBar(DBNprogressBar, iter)
	}
})
xrobin/DeepLearning documentation built on Sept. 18, 2020, 5:23 a.m.