Part 4 – The Multi-layer Perceptron

We have seen in a previous article that neural network with an input and one output layer are able to classify observations but only when the decision boundary is linear. In order to identify non-linear decision boundaries, we need to insert a fully-connected layer of neurons between the input and output layer. This type of network is called a Multi-layer Perceptron.

We will use a multiplayer perceptron to perform binary classification on a data set of 500 observations with non-linear decision boundary. Each observations has 2 features (X1 and X2). The red dots have the class 0 and the blue dots have the class 1.

Since we have 2 features, we need 2 input neurons. Since there are 2 classes to predict, we can use 1 output neuron (which can take the value 0 or 1). We choose to use 4 neurons in the hidden layer. The architecture of the network is as follows.

Note: we will not add the bias term for simplicity reasons.


The theory remains the same when we only had 2 layers (see previous article). We just need to execute the feedforward and backpropagation passes in the following sequence:

  • initialise the weights randomly
  • feedforward pass between the input layer and the hidden layer to calculate a^{(h)}
  • feedforward pass between the hidden layer and the output layer to calculate a^{(o)}
  • backpropagation pass between the output layer and the hidden layer to calculate the gradient of the loss function with respect to the weights in the hidden layer \frac{\partial J}{\partial w^{(h)}}
  • backpropagation pass between the hidden layer and the input layer to calculate the gradient of the loss function with respect to the weights in the input layer \frac{\partial J}{\partial w^{(i)}}
  • update the weights using gradient descent

We can see that the training error keeps decreasing during training. We can also follow the evolution of the weights during training.

Finally, we can generate unseen data point to test the neural network. The new (test) data points are shown as crosses whereas the training data points are shown as circles.

We can see that most test data points have not been classified correctly but some were not. We can calculate the final test error. The test error is higher than the final train error, which is normal.


We performed binary classification on a problem with non-linear decision boundary using a multilayer perceptron.

In the next article, we will see how to deal with multi class classification problems.