softmaxReg: Fit Multi-Layer Softmax Regression or Classification Model In softmaxreg: Training Multi-Layer Neural Network for Softmax Regression and Classification

Description

Fit softmax regression or classification model with multiple hidden layers neural networks and final softmax layer.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```softmaxReg(x, ...) ## Default S3 method: softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 3000, rang = 0.1, type = "class", algorithm = "rmsprop", rate = 0.05, L2 = FALSE, penalty = 1e-4, threshold = 1e-4, batch = 50,...) ## S3 method for class 'formula' softmaxReg(formula, data, hidden = c(), funName = 'sigmoid', maxit = 3000, rang = 0.1,type = "class", algorithm = "rmsprop", rate = 0.05, L2 = FALSE, penalty = 1e-4, threshold = 1e-4, batch = 50,...) ## S3 method for class 'softmax' predict(object, newdata, ...) ## S3 method for class 'softmax' summary(object, ...) ```

Arguments

 `formula` Formula of form y ~ x1 + x2 + ... for 'class' type classification; And (y1 + y2+ ...+ yk) ~ x1 + x2 + ... for 'raw' type regression. `x` Matrix or data frame of x input values. `y` Vector of target values y for 'class' type classfication and matrix or data frame of target values (y1,y2,...yk) for 'raw' type regression. `data` Data frame containing the variables in formula. `hidden` Numeric vector of integers specifying the number of hidden nodes in each layer, e.g. hidden = c(8,5,...). Default NULL. `funName` Name of neural network activation function, including 'sigmoid', 'tanh', 'relu'. Default 'sigmoid'. `maxit` Integer for maximum number of iterations. Default 3000. `rang` Parameter for the range of initial random weights [-rang, rang]. Default 0.1. `type` Parameter indicating the type of softmax task: 'class' denotes the softmax classfication model and the fitted values are factors; 'raw' denotes softmax regression model and the fitted values are raw probability or percentage data of each group. Default 'class'. `algorithm` Parameter indicating which gradient descenting learning algorithm to use, including 'sgd', 'adagrad', 'rmsprop', 'adadelta', 'momentum', 'nag'(Nesterov Momentum), etc. Default 'rmsprop'. `rate` Parameter for the initial learning rate. Default 0.05. `L2` Boolean variable indicating whether L2 regularization term is added to the loss function and gradient to prevent overfitting. Default FALSE. `penalty` Parameter for the penalty cost of the L2 regularization term if L2 is TRUE. Default 1e-4. `threshold` Parameter for the threshold of iteration convergence: loss value less than threshold. Default 1e-4. `batch` Parameter for mini-batch size. Default 50. `object` An object of class `"softmax"`, the fitted model of softmaxReg function. `newdata` Matrix or dataframe of new Data for prediction. `...` Other arguments

Details

This function can be used to train typical n-class classification models. Also, it can be used to fit 'raw' data regression, e.g. the percentage/probability data of each group in the Multinomial Logit/Probit model, as well.

Value

object of class "softmax"

 `weights` Optimal weights parameters found by softmax model, including list of W and B for all layers. `data` Input Training Data. `K` Number of K groups fitted by softmax model. `loss` Numeric vector of the loss function values over iterations. `fitted.values` Matrix of the fitted values yFitMat for the training data. Dimensions: number of observations by K; `iteration` Number of iteration reached before stop. `convergence` Boolean variable for whether softmax model reached convergence.

Xichen Ding

References

MNIST Dataset HandWritten Digit Recognition: http://yann.lecun.com/exdb/mnist/

MNIST Data Reading method reuse R code from: brendan o'connor - https://gist.github.com/brendano/39760

Reuter 50 DataSet: UCI Archived Dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip

`wordEmbed` `document` `loadURLData`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188``` ```## Not run: #### Example 1, Softmax classification with hidden layer and no regularization term library(softmaxreg) data(iris) x = iris[,1:4] y = iris\$Species # Training with hidden layer set 5 units softmax_model = softmaxReg(x, y, hidden = c(5), maxit = 100, type = "class", algorithm = "adagrad", rate = 0.05, batch = 20) summary(softmax_model) yFitMat = softmax_model\$fitted.values yFit = c() for (i in 1:length(y)) { yFit = c(yFit, which(yFitMat[i,]==max(yFitMat[i,]))) } table(y, yFit) # Caculate AIC and BIC information criterion aic = AIC(softmax_model) bic = BIC(softmax_model) cat("AIC",aic,'\n') cat("BIC",bic,'\n') # Make new Prediction newdata = iris[1:100,1:4] yPred = predict(softmax_model, newdata) #### Example 2, Softmax classification with formula and dataframe input f = formula(Species~.) # formula with succinct expression softmax_model_fm = softmaxReg(f, data = iris, hidden = c(5), maxit = 100, type = "class", algorithm = "adagrad", rate = 0.05, batch = 20) summary(softmax_model_fm) #### Example 3: Softmax classfication with L2 regularization softmax_model_L2 = softmaxReg(x, y, hidden = c(5), maxit = 100, type = "class", algorithm = "adagrad", L2 = TRUE, penalty = 1e-4, batch = 20) summary(softmax_model_L2) # Compare Two Model Loss # Note L2 loss value include the ||W||^2 term, larger than loss of previous model loss1 = softmax_model\$loss loss2 = softmax_model_L2\$loss plot(c(1:length(loss1)), loss1, xlab = "Iteration", ylab = "Loss Function Value", type = "l", col = "black") lines(c(1:length(loss2)), loss2, col = "red") legend("topright", c("Loss 1: No Regularization", "Loss 2: L2 Regularization"), col = c("black", "red"),pch = 1) #### Example 4: Compare different learning algorithms 'adagrad','sgd', # 'rmsprop', 'momentum', 'nag' (Nesterov Momentum) library(softmaxreg) data(iris) x = iris[,1:4] y = iris\$Species model1 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1, type = "class", algorithm = "sgd", rate = 0.1, batch = 150) loss1 = model1\$loss model2 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1, type = "class", algorithm = "adagrad", rate = 0.1, batch = 150) loss2 = model2\$loss model3 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1, type = "class", algorithm = "rmsprop", rate = 0.1, batch = 150) loss3 = model3\$loss model4 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1, type = "class", algorithm = "momentum", rate = 0.1, batch = 150) loss4 = model4\$loss model5 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 100, rang = 0.1, type = "class", algorithm = "nag", rate = 0.1, batch = 150) loss5 = model5\$loss # plot the loss convergence iteration = c(1:length(loss1)) plot(iteration, loss1, xlab = "iteration", ylab = "loss", ylim = c(0, max(loss1,loss2,loss3,loss4,loss5) + 0.01), type = "p", col = "black", cex = 0.7) title("Convergence Comparision Between Learning Algorithms") points(iteration, loss2, col = "red", pch = 2, cex = 0.7) points(iteration, loss3, col = "blue", pch = 3, cex = 0.7) points(iteration, loss4, col = "green", pch = 4, cex = 0.7) points(iteration, loss5, col = "magenta", pch = 5, cex = 0.7) legend("topright", c("SGD", "Adagrad", "RMSprop", "Momentum", "NAG"), col = c("black", "red", "blue", "green", "magenta"),pch = c(1,2,3,4,5)) ## Comments: From this experiments we can see that momemtum learning algorithm ## generally converge faster than the standard sgd and its variations #### Example 5: Multiple class classification: Read Online Dataset and make document classification library(softmaxreg) data(word2vec) # default 20 dimension word2vec dataset #### Reuter 50 DataSet UCI Archived Dataset from ## URL: "http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip" URL = "http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip" folder = getwd() loadURLData(URL, folder, unzip = TRUE) ##Training Data subFoler = c('AaronPressman', 'AlanCrosby', 'AlexanderSmith', 'BenjaminKangLim', 'BernardHickey') docTrain = document(path = paste(folder, "\C50train\",subFoler, sep = ""), pattern = 'txt') xTrain = wordEmbed(docTrain, dictionary = word2vec) yTrain = c(rep(1,50), rep(2,50), rep(3,50), rep(4,50), rep(5,50)) # Assign labels to 5 different authors ##Testing Data docTest = document(path = paste(folder, "\C50test\",subFoler, sep = ""), pattern = 'txt') xTest = wordEmbed(docTest, dictionary = word2vec) yTest = c(rep(1,50), rep(2,50), rep(3,50), rep(4,50), rep(5,50)) samp = sample(250, 50) xTest = xTest[samp,] yTest = yTest[samp] ## Train Softmax Classification Model, 20-10-5 softmax_model = softmaxReg(xTrain, yTrain, hidden = c(10), maxit = 500, type = "class", algorithm = "nag", rate = 0.1, L2 = TRUE) summary(softmax_model) yFit = predict(softmax_model, newdata = xTrain) table(yTrain, yFit) ## Testing yPred = predict(softmax_model, newdata = xTest) table(yTest, yPred) #### Comments: Increase the word2vec dimensions to 50 or even 100 will help increase #### the capacity of the model and prediction precision #### Example 6: 'MNIST' dataset HandWritten Digit Recognition ## Download MNIST Dataset from below URL and Gunzip them ## http://yann.lecun.com/exdb/mnist/ ## MNIST Data Reading method reuse R code from: ## brendan o'connor - https://gist.github.com/brendano/39760 library(softmaxreg) # Replace with your local path path = "D:\DeepLearning\MNIST\" ## 10-class classification, Digit 0-9 x = load_image_file(paste(path,'train-images-idx3-ubyte', sep="")) y = load_label_file(paste(path,'train-labels-idx1-ubyte', sep="")) xTest = load_image_file(paste(path,'t10k-images-idx3-ubyte',sep="")) yTest = load_label_file(paste(path,'t10k-labels-idx1-ubyte', sep="")) ## Normalize Input Data x = x/255 xTest = xTest/255 ## Compare Convergence Rate of MNIST dataset model1 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1, type = "class", algorithm = "sgd", rate = 0.01, batch = 100) loss1 = model1\$loss model2 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1, type = "class", algorithm = "adagrad", rate = 0.01, batch = 100) loss2 = model2\$loss model3 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1, type = "class", algorithm = "rmsprop", rate = 0.01, batch = 100) loss3 = model3\$loss model4 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1, type = "class", algorithm = "momentum", rate = 0.01, batch = 100) loss4 = model4\$loss model5 = softmaxReg(x, y, hidden = c(), funName = 'sigmoid', maxit = 50, rang = 0.1, type = "class", algorithm = "nag", rate = 0.01, batch = 100) loss5 = model5\$loss # plot the loss convergence iteration = c(1:length(loss1)) myplot = plot(iteration, loss1, xlab = "iteration", ylab = "loss", ylim = c(0, max(loss1,loss2,loss3,loss4,loss5) + 0.01), type = "p", col = "black", cex = 0.7) title("Convergence Comparision Between Learning Algorithms") points(iteration, loss2, col = "red", pch = 2, cex = 0.7) points(iteration, loss3, col = "blue", pch = 3, cex = 0.7) points(iteration, loss4, col = "green", pch = 4, cex = 0.7) points(iteration, loss5, col = "magenta", pch = 5, cex = 0.7) legend("topright", c("SGD", "Adagrad", "RMSprop", "Momentum", "NAG"), col = c("black", "red", "blue", "green", "magenta"),pch = c(1,2,3,4,5)) save.image() ## End(Not run) ```