Description Usage Arguments Details Value Author(s) References See Also Examples
Fit softmax regression or classification model with multiple hidden layers neural networks and final softmax layer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | softmaxReg(x, ...)
## Default S3 method:
softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 3000,
rang = 0.1, type = "class", algorithm = "rmsprop", rate = 0.05,
L2 = FALSE, penalty = 1e-4, threshold = 1e-4, batch = 50,...)
## S3 method for class 'formula'
softmaxReg(formula, data, hidden = c(), funName = 'sigmoid', maxit = 3000,
rang = 0.1,type = "class", algorithm = "rmsprop", rate = 0.05,
L2 = FALSE, penalty = 1e-4, threshold = 1e-4, batch = 50,...)
## S3 method for class 'softmax'
predict(object, newdata, ...)
## S3 method for class 'softmax'
summary(object, ...)
|
formula |
Formula of form y ~ x1 + x2 + ... for 'class' type classification; And (y1 + y2+ ...+ yk) ~ x1 + x2 + ... for 'raw' type regression. |
x |
Matrix or data frame of x input values. |
y |
Vector of target values y for 'class' type classfication and matrix or data frame of target values (y1,y2,...yk) for 'raw' type regression. |
data |
Data frame containing the variables in formula. |
hidden |
Numeric vector of integers specifying the number of hidden nodes in each layer, e.g. hidden = c(8,5,...). Default NULL. |
funName |
Name of neural network activation function, including 'sigmoid', 'tanh', 'relu'. Default 'sigmoid'. |
maxit |
Integer for maximum number of iterations. Default 3000. |
rang |
Parameter for the range of initial random weights [-rang, rang]. Default 0.1. |
type |
Parameter indicating the type of softmax task: 'class' denotes the softmax classfication model and the fitted values are factors; 'raw' denotes softmax regression model and the fitted values are raw probability or percentage data of each group. Default 'class'. |
algorithm |
Parameter indicating which gradient descenting learning algorithm to use, including 'sgd', 'adagrad', 'rmsprop', 'adadelta', 'momentum', 'nag'(Nesterov Momentum), etc. Default 'rmsprop'. |
rate |
Parameter for the initial learning rate. Default 0.05. |
L2 |
Boolean variable indicating whether L2 regularization term is added to the loss function and gradient to prevent overfitting. Default FALSE. |
penalty |
Parameter for the penalty cost of the L2 regularization term if L2 is TRUE. Default 1e-4. |
threshold |
Parameter for the threshold of iteration convergence: loss value less than threshold. Default 1e-4. |
batch |
Parameter for mini-batch size. Default 50. |
object |
An object of class |
newdata |
Matrix or dataframe of new Data for prediction. |
... |
Other arguments |
This function can be used to train typical n-class classification models. Also, it can be used to fit 'raw' data regression, e.g. the percentage/probability data of each group in the Multinomial Logit/Probit model, as well.
object of class "softmax"
weights |
Optimal weights parameters found by softmax model, including list of W and B for all layers. |
data |
Input Training Data. |
K |
Number of K groups fitted by softmax model. |
loss |
Numeric vector of the loss function values over iterations. |
fitted.values |
Matrix of the fitted values yFitMat for the training data. Dimensions: number of observations by K; |
iteration |
Number of iteration reached before stop. |
convergence |
Boolean variable for whether softmax model reached convergence. |
Xichen Ding
MNIST Dataset HandWritten Digit Recognition: http://yann.lecun.com/exdb/mnist/
MNIST Data Reading method reuse R code from: brendan o'connor - https://gist.github.com/brendano/39760
Reuter 50 DataSet: UCI Archived Dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip
wordEmbed
document
loadURLData
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | ## Not run:
#### Example 1, Softmax classification with hidden layer and no regularization term
library(softmaxreg)
data(iris)
x = iris[,1:4]
y = iris$Species
# Training with hidden layer set 5 units
softmax_model = softmaxReg(x, y, hidden = c(5), maxit = 100, type = "class",
algorithm = "adagrad", rate = 0.05, batch = 20)
summary(softmax_model)
yFitMat = softmax_model$fitted.values
yFit = c()
for (i in 1:length(y)) {
yFit = c(yFit, which(yFitMat[i,]==max(yFitMat[i,])))
}
table(y, yFit)
# Caculate AIC and BIC information criterion
aic = AIC(softmax_model)
bic = BIC(softmax_model)
cat("AIC",aic,'\n')
cat("BIC",bic,'\n')
# Make new Prediction
newdata = iris[1:100,1:4]
yPred = predict(softmax_model, newdata)
#### Example 2, Softmax classification with formula and dataframe input
f = formula(Species~.) # formula with succinct expression
softmax_model_fm = softmaxReg(f, data = iris, hidden = c(5), maxit = 100, type = "class",
algorithm = "adagrad", rate = 0.05, batch = 20)
summary(softmax_model_fm)
#### Example 3: Softmax classfication with L2 regularization
softmax_model_L2 = softmaxReg(x, y, hidden = c(5), maxit = 100, type = "class",
algorithm = "adagrad", L2 = TRUE, penalty = 1e-4, batch = 20)
summary(softmax_model_L2)
# Compare Two Model Loss
# Note L2 loss value include the ||W||^2 term, larger than loss of previous model
loss1 = softmax_model$loss
loss2 = softmax_model_L2$loss
plot(c(1:length(loss1)), loss1, xlab = "Iteration", ylab = "Loss Function Value",
type = "l", col = "black")
lines(c(1:length(loss2)), loss2, col = "red")
legend("topright", c("Loss 1: No Regularization", "Loss 2: L2 Regularization"),
col = c("black", "red"),pch = 1)
#### Example 4: Compare different learning algorithms 'adagrad','sgd',
# 'rmsprop', 'momentum', 'nag' (Nesterov Momentum)
library(softmaxreg)
data(iris)
x = iris[,1:4]
y = iris$Species
model1 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1,
type = "class", algorithm = "sgd", rate = 0.1, batch = 150)
loss1 = model1$loss
model2 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1,
type = "class", algorithm = "adagrad", rate = 0.1, batch = 150)
loss2 = model2$loss
model3 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1,
type = "class", algorithm = "rmsprop", rate = 0.1, batch = 150)
loss3 = model3$loss
model4 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1,
type = "class", algorithm = "momentum", rate = 0.1, batch = 150)
loss4 = model4$loss
model5 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1,
type = "class", algorithm = "nag", rate = 0.1, batch = 150)
loss5 = model5$loss
# plot the loss convergence
iteration = c(1:length(loss1))
plot(iteration, loss1, xlab = "iteration", ylab = "loss", ylim = c(0,
max(loss1,loss2,loss3,loss4,loss5) + 0.01), type = "p", col = "black", cex = 0.7)
title("Convergence Comparision Between Learning Algorithms")
points(iteration, loss2, col = "red", pch = 2, cex = 0.7)
points(iteration, loss3, col = "blue", pch = 3, cex = 0.7)
points(iteration, loss4, col = "green", pch = 4, cex = 0.7)
points(iteration, loss5, col = "magenta", pch = 5, cex = 0.7)
legend("topright", c("SGD", "Adagrad", "RMSprop", "Momentum", "NAG"),
col = c("black", "red", "blue", "green", "magenta"),pch = c(1,2,3,4,5))
## Comments: From this experiments we can see that momemtum learning algorithm
## generally converge faster than the standard sgd and its variations
#### Example 5: Multiple class classification: Read Online Dataset and make document classification
library(softmaxreg)
data(word2vec) # default 20 dimension word2vec dataset
#### Reuter 50 DataSet UCI Archived Dataset from
## URL: "http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip"
URL = "http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip"
folder = getwd()
loadURLData(URL, folder, unzip = TRUE)
##Training Data
subFoler = c('AaronPressman', 'AlanCrosby', 'AlexanderSmith', 'BenjaminKangLim', 'BernardHickey')
docTrain = document(path = paste(folder, "\C50train\",subFoler, sep = ""), pattern = 'txt')
xTrain = wordEmbed(docTrain, dictionary = word2vec)
yTrain = c(rep(1,50), rep(2,50), rep(3,50), rep(4,50), rep(5,50))
# Assign labels to 5 different authors
##Testing Data
docTest = document(path = paste(folder, "\C50test\",subFoler, sep = ""), pattern = 'txt')
xTest = wordEmbed(docTest, dictionary = word2vec)
yTest = c(rep(1,50), rep(2,50), rep(3,50), rep(4,50), rep(5,50))
samp = sample(250, 50)
xTest = xTest[samp,]
yTest = yTest[samp]
## Train Softmax Classification Model, 20-10-5
softmax_model = softmaxReg(xTrain, yTrain, hidden = c(10), maxit = 500, type = "class",
algorithm = "nag", rate = 0.1, L2 = TRUE)
summary(softmax_model)
yFit = predict(softmax_model, newdata = xTrain)
table(yTrain, yFit)
## Testing
yPred = predict(softmax_model, newdata = xTest)
table(yTest, yPred)
#### Comments: Increase the word2vec dimensions to 50 or even 100 will help increase
#### the capacity of the model and prediction precision
#### Example 6: 'MNIST' dataset HandWritten Digit Recognition
## Download MNIST Dataset from below URL and Gunzip them
## http://yann.lecun.com/exdb/mnist/
## MNIST Data Reading method reuse R code from:
## brendan o'connor - https://gist.github.com/brendano/39760
library(softmaxreg)
# Replace with your local path
path = "D:\DeepLearning\MNIST\"
## 10-class classification, Digit 0-9
x = load_image_file(paste(path,'train-images-idx3-ubyte', sep=""))
y = load_label_file(paste(path,'train-labels-idx1-ubyte', sep=""))
xTest = load_image_file(paste(path,'t10k-images-idx3-ubyte',sep=""))
yTest = load_label_file(paste(path,'t10k-labels-idx1-ubyte', sep=""))
## Normalize Input Data
x = x/255
xTest = xTest/255
## Compare Convergence Rate of MNIST dataset
model1 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1,
type = "class", algorithm = "sgd", rate = 0.01, batch = 100)
loss1 = model1$loss
model2 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1,
type = "class", algorithm = "adagrad", rate = 0.01, batch = 100)
loss2 = model2$loss
model3 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1,
type = "class", algorithm = "rmsprop", rate = 0.01, batch = 100)
loss3 = model3$loss
model4 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1,
type = "class", algorithm = "momentum", rate = 0.01, batch = 100)
loss4 = model4$loss
model5 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1,
type = "class", algorithm = "nag", rate = 0.01, batch = 100)
loss5 = model5$loss
# plot the loss convergence
iteration = c(1:length(loss1))
myplot = plot(iteration, loss1, xlab = "iteration", ylab = "loss", ylim = c(0,
max(loss1,loss2,loss3,loss4,loss5) + 0.01), type = "p", col = "black", cex = 0.7)
title("Convergence Comparision Between Learning Algorithms")
points(iteration, loss2, col = "red", pch = 2, cex = 0.7)
points(iteration, loss3, col = "blue", pch = 3, cex = 0.7)
points(iteration, loss4, col = "green", pch = 4, cex = 0.7)
points(iteration, loss5, col = "magenta", pch = 5, cex = 0.7)
legend("topright", c("SGD", "Adagrad", "RMSprop", "Momentum", "NAG"),
col = c("black", "red", "blue", "green", "magenta"),pch = c(1,2,3,4,5))
save.image()
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.