Description Usage Arguments Details Value Author(s) References See Also Examples
autoencode
implements the sparse autoencoder (described in Andrew Ng's lecture notes http://www.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencoder.pdf). The features learned by the autoencoder trained on unlabeled data are available through weights of the trained autoencoder object. These automatically learned features are useful, e.g., in constructing deep belief networks.
1 2 3 4 |
X.train |
a matrix of training data, with rows corresponding to training examples, and columns corresponding to input channels. For example, if training data consists of 10x10-pixel images, then X.train has 100 columns corresponding to each pixel. |
X.test |
an optional matrix of test data in the same format as |
nl |
number of layers in the autoencoder (default is 3 layers: input, hidden, output). |
N.hidden |
a vector of numbers of units (neurons) in each of the hidden layers. For |
unit.type |
type of units used in the autoencoder, defined by the activation function of the units ('logistic' or 'tanh'). |
lambda |
weight decay parameter controlling the relative importance of the regularization term in the autoencoder's cost function. |
beta |
weight of sparsity penalty term. |
rho |
sparsity parameter, constrains the average (over training examples) activation of hidden units. Typically should be a small value close to zero (hence sparse autoencoder). |
epsilon |
a small parameter for initialization of autoencoder weights as small gaussian random numbers sampled from the normal distribution N(0,epsilon^2). |
optim.method |
the optimization method to be used for searching the minimum of the cost function. See |
rel.tol |
relative convergence tolerance determining the convergence of |
max.iterations |
maximum number of iterations in searching for cost function minimum. Defaults to 2000. |
rescale.flag |
a logical flag indicating whether to uniformly rescale the training matrix to make sure the values of all input channels are within the range of unit outputs (the range is [0,1] for |
rescaling.offset |
a small non-negative value used in rescaling to |
An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation to adjust its weights, attempting to learn to make its target values (outputs) to be equal to its inputs. In other words, it is trying to learn an approximation to the identity function, so as its output is similar to its input, for all training examples. With the sparsity constraint enforced (requiring that the average, over training set, activation of hidden units be small), such autoencoder automatically learns useful features of the unlabeled training data, which can be used for, e.g., data compression (with losses), or as features in deep belief networks.
The training is performed by optimizing the autoencoder's cost function J(W,b)
that depends on the autoencoder's weights W and biases b. The optimization (searching for a local minimum) is performed with the optim
function using one of the three methods: 'BFGS', 'L-BFGS-B', or 'CG' (see details in help(optim)
).
After the optimization converges, the mean squared error between the output and input matrix (either the training matrix, or a test matrix) is evaluated as a measure of goodness of fit of the autoencoder.
For the autoencoder to work well, one must rescale, if necessary, the training matrix to make sure all the input channels (and hence all the output channels) have values within the range of unit activation function values: [0,1] for 'logistic' units, [-1,1] for 'tanh' units. If rescaling flag is true (rescale.flag=TRUE
), the input matrix is rescaled uniformly using its minimum and maximum elements min(X.train)
and max(X.train)
as
X.train.rescaled=(X.train-min(X.train))/(max(X.train)-min(X.train))
for 'logistic' units,
and X.train.rescaled=2*(X.train-min(X.train))/(max(X.train)-min(X.train))-1
for 'tanh' units.
The minimum and maximum elements of the training matrix are then passed to the object returned by the function, to be used for rescaling input data in predict.autoencode
function, for compatibility with the rescaling of the data used for training the autoencoder.
An object of class autoencoder
, containing a list with the following components:
W |
a list of weight matrices in the format |
b |
a list of biases; |
unit.type |
type of units used in the autoencoder, the value if the same as |
rescaling |
a list with elements |
mean.error.training.set |
average, over all training matrix rows (training examples), sum of |
mean.error.test.set |
average, over all test matrix rows (test examples), sum of |
Eugene Dubossarsky (project leader, chief designer), Yuriy Tyshetskiy (design, implementation, testing)
See Andrew Ng's lecture notes at http://www.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencoder.pdf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ## Train the autoencoder on unlabeled set of 5000 image patches of
## size Nx.patch by Ny.patch, randomly cropped from 10 nature photos:
## Load a training matrix with rows corresponding to training examples,
## and columns corresponding to input channels (e.g., pixels in images):
data('training_matrix_N=5e3_Ninput=100') ## the matrix contains 5e3 image
## patches of 10 by 10 pixels
## Set up the autoencoder architecture:
nl=3 ## number of layers (default is 3: input, hidden, output)
unit.type = "logistic" ## specify the network unit type, i.e., the unit's
## activation function ("logistic" or "tanh")
Nx.patch=10 ## width of training image patches, in pixels
Ny.patch=10 ## height of training image patches, in pixels
N.input = Nx.patch*Ny.patch ## number of units (neurons) in the input layer (one unit per pixel)
N.hidden = 10*10 ## number of units in the hidden layer
lambda = 0.0002 ## weight decay parameter
beta = 6 ## weight of sparsity penalty term
rho = 0.01 ## desired sparsity parameter
epsilon <- 0.001 ## a small parameter for initialization of weights
## as small gaussian random numbers sampled from N(0,epsilon^2)
max.iterations = 2000 ## number of iterations in optimizer
## Train the autoencoder on training.matrix using BFGS optimization method
## (see help('optim') for details):
## WARNING: the training can take a long time (~1 hour) for this dataset!
## Not run:
autoencoder.object <- autoencode(X.train=training.matrix,nl=nl,N.hidden=N.hidden,
unit.type=unit.type,lambda=lambda,beta=beta,rho=rho,epsilon=epsilon,
optim.method="BFGS",max.iterations=max.iterations,
rescale.flag=TRUE,rescaling.offset=0.001)
## End(Not run)
## N.B.: Training this autoencoder takes a long time, so in this example we do not run the above
## autoencode function, but instead load the corresponding pre-trained autoencoder.object.
## Report mean squared error for training and test sets:
cat("autoencode(): mean squared error for training set: ",
round(autoencoder.object$mean.error.training.set,3),"\n")
## Extract weights W and biases b from autoencoder.object:
W <- autoencoder.object$W
b <- autoencoder.object$b
## Visualize hidden units' learned features:
(autoencoder.object,Nx.patch,Ny.patch)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.