In AlfonsoRReyes/rDeepThought:

Reading the Data

# Read in `iris` data
iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) 

# Return the first part of `iris`
head(iris)

# Inspect the structure
str(iris)

# Obtain the dimensions
dim(iris)

Data Exploration

For this tutorial, you’ll continue to work with the famous iris dataset that you imported with the read.csv() function.

For those of you who don’t have the biology knowledge that is needed to work with this data, here’s some background information: all flowers contain a sepal and a petal. The sepal encloses the petals and is typically green and leaf-like, while the petals are typically colored leaves. For the iris flowers, this is just a little bit different, as you can see in the following picture:

You might have already seen in the previous seciton that the iris data frame didn’t have any column names after the import. Now, for the remainder of the tutorial, that’s not too important: even though the read.csv() function returns the data in a data.frame to you, the data that you’ll need to pass to the fit() function needs to be a matrix or array.

Some things to keep in mind about these two data structures that were just mentioned: - Matrices and arrays don’t have column names; - Matrices are two-dimensional objects of a single data type; - Arrays are multi-dimensional objects of a single data type;

Note that the data frame, on the other hand, is a special kind of named list where all elements have the same length. It’s a multi-dimensional object that can contain multiple data types. You already saw that this is true when you checked out the structure of the iris data frame in the previous section. Knowing this and taking into account that you’ll need to work towards a two- or multi-dimensional object of a single data type, you should already prepare to do some preprocessing before you start building your neural network!

For now, column names can be handy for exploring purposes and they will most definitely facilitate your understanding of the data, so let’s just add some column names with the help of the names() function. Next, you can immediately use the iris variable in your data exploration! Plot, for example, how the petal length and the petal width correlate with the plot() function.

names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")

plot(iris$Petal.Length, 
     iris$Petal.Width, 
     pch=21, bg=c("red","green3","blue")[unclass(iris$Species)], 
     xlab="Petal Length", 
     ylab="Petal Width")

Note that you use the unclass() function to convert the names of the species, that is, “setosa, versicolor”, and “virginica”, to the numeric 1, 2, and 3.

The graph indicates a positive correlation between the Petal.Length and the Petal.Width for the different species of the iris flowers. However, this is something that you probably want to test with the cor() function, which will give you the overall correlation between all attributes that are included in the data set:

# Overall correlation between `Petal.Length` and `Petal.Width` 
cor(iris$Petal.Length, iris$Petal.Width)

Additionally, you can use the corrplot package in combination with the cor() function to plot the correlations between your data’s attributes; In this case, you calculate the overall correlation for all attributes of the iris data frame. You store the result of this calculation in a variable M and pass it to the corrplot() function.

Also, don’t forget to specify a method argument to indicate how you want the data to be plotted!

You can further experiment with the visualization method in the DataCamp Light chunk below:

library(corrplot)

# Store the overall correlation in `M`
M <- cor(iris[,1:4])

# Plot the correlation plot with `M`
corrplot(M, method="circle")

Data Preprocessing

Before you can build your model, you also need to make sure that your data is cleaned, normalized (if applicable) and divided into training and test sets. Since the dataset comes from the UCI Machine Learning Repository, you can expect it to already be somewhat clean, but let’s double check the quality of your data anyway.

At first sight, when you inspected the data with head(), you didn’t really see anything out of the ordinary, right? Let’s make use of summary() and str() to briefly recap what you learned when you checked whether the import of your data was successful:

# Pull up a summary of `iris`
summary(iris)

# Inspect the structure of `iris`
str(iris)

Now that you’re sure that the data is clean enough, you can start by checking if the normalization is necessary for any of the data with which you’re working for this tutorial.

Normalizing Your Data With A User Defined Function (UDF)

From the result of the summary() function in the DataCamp Light chunk above, you see that the Iris data set doesn’t need to be normalized: the Sepal.Length attribute has values that go from 4.3 to 7.9 and Sepal.Width contains values from 2 to 4.4, while Petal.Length’s values range from 1 to 6.9 and Petal.Width goes from 0.1 to 2.5. In other words, all values of all the attributes of the Iris data set are contained within the range of 0.1 and 7.9, which you can consider acceptable.

However, it can still be a good idea to study the effect of normalization on your data; You can even go as far as passing the normalized data to your model to see if there is any effect. This is outside the scope of this tutorial, but feel free to try it out on your own! The code is all here in this tutorial :)

You can make your own function to normalize the iris data; In this case, it’s a min-max normalization function, which linearly transforms your data to the function (x-min)/(max-min). From that perspective, translating this formula to R is pretty simple: you make a function to which you pass x or some data. You’ll then calculate the result of the first subtraction x-min and store the result in num. Next, you also calculate max-min and store the result in denom. The result of your normalize() function should return the division of num by max.

To apply this user-defined function on your iris data (target values excluded), you need to not only use normalize, but also the lapply() function to normalize the data, just like here below:

# Build your own `normalize()` function
normalize <- function(x) {
  num <- x - min(x)
  denom <- max(x) - min(x)
  return (num/denom)
}

# Normalize the `iris` data
iris_norm <- as.data.frame(lapply(iris[1:4], normalize))

# Return the first part of `iris` 
head(iris_norm)

Tip: use the hist() function in the R console to to study the distribution of the Iris data before (iris) and after the normalization (iris_norm).

op <- par(mfrow = c(2, 4))
bef <- lapply(iris[1:4], hist)
aft <- lapply(iris_norm[1:4], hist)
par(op)

Normalize Your Data With keras

To use the normalize() function from the keras package, you first need to make sure that you’re working with a matrix. As you probably remember from earlier, the characteristic of matrices is that the matrix data elements are of the same basic type; In this case, you have target values that are of type factor, while the rest is all numeric.

This needs to change first.

You can use the as.numeric() function to convert the data to numbers:

iris[,5] <- as.numeric(iris[,5]) -1

# Turn `iris` into a matrix
iris <- as.matrix(iris)

# Set `iris` `dimnames` to `NULL`
dimnames(iris) <- NULL
class(iris)

A numerical data frame is alright, but you’ll need to convert the data to an array or a matrix if you want to make use of the keras package. You can easily do this with the as.matrix() function; Don’t forget here to set the dimnames to NULL.

As you might have read in the section above, normalizing the Iris data is not necessary. Nevertheless, it’s still a good idea to study normalization and its effect, and to see how this can not only be done with a UDF but also with the keras built-in normalize() function.

With your data converted to a matrix, you can indeed also use the keras package to study the effect of a possible normalization on your data:

library(keras)

# Normalize the `iris` data
iris <- normalize(iris[,1:4])

# Return the summary of `iris`
summary(iris)

Training And Test Sets

Now that you have checked the quality of your data and you know that it’s not necessary to normalize your data, you can continue to work with the original data and split it into training and test sets so that you’re finally ready to start building your model. By doing this, you ensure that you can make honest assessments of the performance of your predicte model afterwards.

Before you split your data into training and test sets, you best first set a seed. You can easily do this with set.seed(): use this exact function and just pass a random integer to it. A seed is a number of R’s random number generator. The major advantage of setting a seed is that you can get the same sequence of random numbers whenever you supply the same seed in the random number generator.

This is great for the reproducibility of your code!

You use the sample() function to take a sample with a size that is set as the number of rows of the Iris data set, or 150. You sample with replacement: you choose from a vector of 2 elements and assign either 1 or 2 to the 150 rows of the Iris data set. The assignment of the elements is subject to probability weights of 0.67 and 0.33.

# Read in `iris` data
iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) 

# Determine sample size
ind <- sample(2, nrow(iris), replace=TRUE, prob=c(0.67, 0.33))

# Split the `iris` data
iris.training <- iris[ind==1, 1:4]
iris.test <- iris[ind==2, 1:4]

# Split the class attribute
iris.trainingtarget <- iris[ind==1, 5]
iris.testtarget <- iris[ind==2, 5]

The replace argument of the sample() function is set to TRUE, which means that you assign a 1 or a 2 to a certain row and then reset the vector of 2 to its original state.

In other words, for the next rows in your data set, you can either assign a 1 or a 2, each time again. The probability of choosing a 1 or a 2 should not be proportional to the weights amongst the remaining items, so you specify probability weights.

One-Hot Encoding

You have successfully split your data but there is still one step that you need to go through to start building your model. Can you guess which one?

When you want to model multi-class classification problems with neural networks, it is generally a good practice to make sure that you transform your target attribute from a vector that contains values for each class value to a matrix with a boolean for each class value and whether or not a given instance has that class value or not.

This is a loose explanation of One Hot Encoding (OHE). It sounds quite complex, doesn’t it?

Luckily, the keras package has a to_categorical() function that will do all of this for you; Pass in the iris.trainingtarget and the iris.testtarget to this function and store the result in iris.trainLabels and iris.testLabels:

# One hot encode training target values
iris.trainLabels <- to_categorical(iris.trainingtarget)

# One hot encode test target values
iris.testLabels <- to_categorical(iris.testtarget)

# Print out the iris.testLabels to double check the result
print(iris.testLabels)

Now you have officially reached the end of the exploration and preprocessing steps in this tutorial. You can now go on to building your neural network with keras!

Constructing the Model

To start constructing a model, you should first initialize a sequential model with the help of the keras_model_sequential() function. Then, you’re ready to start modeling.

However, before you begin, it’s a good idea to revisit your original question about this data set: can you predict the species of a certain Iris flower?

It’s easier to work with numerical data and you have preprocessed the data and one hot encoded the values of the target variable: a flower is either of type versicolor, setosa or virginica and this is reflected with binary 1 and 0 values.

A type of network that performs well on such a problem is a multi-layer perceptron. This type of neural network is often fully connected. That means that you’re looking to build a fairly simple stack of fully-connected layers to solve this problem. As for the activation functions that you will use, it’s best to use one of the most common ones here for the purpose of getting familiar with Keras and neural networks, which is the relu activation function. This rectifier activation function is used in a hidden layer, which is generally speaking a good practice.

In addition, you also see that the softmax activation function is used in the output layer. You do this because you want to make sure that the output values are in the range of 0 and 1 and may be used as predicted probabilities:

# Initialize a sequential model
model <- keras_model_sequential()

# Add layers to the model
model %>% 
    layer_dense(units = 8, activation = 'relu', input_shape = c(4)) %>% 
    layer_dense(units = 3, activation = 'softmax')

Note how the output layer creates 3 output values, one for each Iris class (versicolor, virginica or setosa). The first layer, which contains 8 hidden notes, on the other hand, has an input_shape of 4. This is because your training data iris.training has 4 columns.

You can further inspect your model with the following functions:

You can use the summary() function to print a summary representation of your model;
get_config() will return a list that contains the configuration of the model;
get_layer() will return the layer configuration.
layers attribute can be used to retrieve a flattened list of the model’s layers;
To list the input tensors, you can use the inputs attribute; and
Lastly, to retrieve the output tensors, you can make use of the outputs attribute.

# Print a summary of a model
summary(model)

# Get model configuration
get_config(model)

# Get layer configuration
get_layer(model, index = 1)

# List the model's layers
model$layers

# List the input tensors
model$inputs

# List the output tensors
model$outputs

Compile And Fit The Model

Now that you have set up the architecture of your model, it’s time to compile and fit the model to the data. To compile your model, you configure the model with the adam optimizer and the categorical_crossentropy loss function. Additionally, you also monitor the accuracy during the training by passing 'accuracy' to the metrics argument.

# Compile the model
model %>% compile(
     loss = 'categorical_crossentropy',
     optimizer = 'adam',
     metrics = 'accuracy'
 )

The optimizer and the loss are two arguments that are required if you want to compile the model.

Some of the most popular optimization algorithms used are the Stochastic Gradient Descent (SGD), ADAM and RMSprop. Depending on whichever algorithm you choose, you’ll need to tune certain parameters, such as learning rate or momentum. The choice for a loss function depends on the task that you have at hand: for example, for a regression problem, you’ll usually use the Mean Squared Error (MSE).

As you see in this example, you used categorical_crossentropy loss function for the multi-class classification problem of determining whether an iris is of type versicolor, virginica or setosa. However, note that if you would have had a binary-class classification problem, you should have made use of the binary_crossentropy loss function.

Next, you can also fit the model to your data; In this case, you train the model for 200 epochs or iterations over all the samples in iris.training and iris.trainLabels, in batches of 5 samples.

# Fit the model 
model %>% fit(
     iris.training, 
     iris.trainLabels, 
     epochs = 200, 
     batch_size = 5, 
     validation_split = 0.2
 )

Tip if you want, you can also specify the verbose argument in the fit() function. By setting this argument to 1, you indicate that you want to see progress bar logging.

What you do with the code above is training the model for a specified number of epochs or exposures to the training dataset. An epoch is a single pass through the entire training set, followed by testing of the verification set. The batch size that you specify in the code above defines the number of samples that going to be propagated through the network. Also, by doing this, you optimize the efficiency because you make sure that you don’t load too many input patterns into memory at the same time.

Visualize The Model Training History

Also, it’s good to know that you can also visualize the fitting if you assign the lines of code in the DataCamp Light chunk above to a variable. You can then pass the variable to the plot() function, like you see in this particular code chunk!

# Store the fitting history in `history` 
history <- model %>% fit(
     iris.training, 
     iris.trainLabels, 
     epochs = 200,
     batch_size = 5, 
     validation_split = 0.2
 )

# Plot the history
plot(history)