Digit Classifier: Training a Neural Network

The following link takes you to a GitHub repository containing all the material for this course. It’s assumed that you have a copy of this.

Introduction

In this lab, you’ll implement the remaining part of the neural network class that we’ll be using for digit classification. This lab is meant to let you put what you learned in the last two chapters into practice.

Understanding

Let’s take a look at neural-network.ipynb.

We’re going to be finishing our NeuralNetwork class implementation.

The function digit_classifier.stochastic_gradient_descent(training, 10, 30, 3, testing) will be doing the bulk of the work this lab. It’s going to train our neural network.

At the end of main, we’ll test how well our neural network’s predictions are. It should be well over 90%.

Neural Network Class Implementation

Let’s break down each component:

1. Complete the cost_derivative Method

The cost derivative method calculates how much the cost function changes with respect to the output activations. For the quadratic cost function, C=12ya2C=\frac{1}{2} \| y - a \|^2, the derivative with respect to the output activations, aa, is simply aya-y.

2. Complete the stochastic_gradient_descent Method

  • This function should train the neural network using mini-batch stochastic gradient descent.
  • The method should:
    1. Shuffle the training_data at the start of each epoch.
    2. Divide the training data into mini-batches of size mini_batch_size.
    3. For each mini-batch, call update_mini_batch.
    4. If test_data is provided, evaluate and optionally print performance at the end of each epoch.
  • The function should not return anything.

3. Complete the update_mini_batch Method

  • This function should apply backpropagation and update the weights and biases using a single mini-batch.
  • For each (x, y) pair in the mini_batch, compute the gradients using back_propagation.
  • Sum all gradients and update the weights and biases by averaging over the mini-batch and scaling by the learning rate eta.

4. Complete the back_propagation Method

  • This function should implement the backpropagation algorithm to compute gradients of the cost function.
  • The method should:
    1. Perform a feedforward pass, recording all intermediate z values and activations.
    2. Compute the error in the output layer using the derivative of the cost function.
    3. Backpropagate the error to compute gradients for each weight and bias.
    4. Return a tuple (nabla_b, nabla_w) where each contains the gradients for biases and weights, respectively.

Testing Your Implementation

After running the main function, you should see that your neural network learns. In the first bit, it should learn extremely quickly, then start tapering down to make more gradual progress. This neural network will typically reach around 94% accuracy on MNIST without any further changes.

Looking Forward

Congrats! You now know how to build and train a neural network. In the next chapter, we’re going to understand what’s actually going on when we train a neural network. Then, in the upcoming section, we’re going to look over every arbitrary decision of our neural network in much more detail in order to improve on our neural network.

Neural Networks From Scratch

Prioritize understanding over memorization. Good luck!