Digit Classifier: Training a Neural Network
The following link takes you to a GitHub repository containing all the material for this course. It’s assumed that you have a copy of this.
Introduction
In this lab, you’ll implement the remaining part of the neural network class that we’ll be using for digit classification. This lab is meant to let you put what you learned in the last two chapters into practice.
Understanding
Let’s take a look at neural-network.ipynb
.
We’re going to be finishing our NeuralNetwork
class implementation.
The function digit_classifier.stochastic_gradient_descent(training, 10, 30, 3, testing)
will be doing
the bulk of the work this lab. It’s going to train our neural network.
At the end of main
, we’ll test how well our neural network’s predictions are. It should be well over 90%.
Neural Network Class Implementation
Let’s break down each component:
1. Complete the cost_derivative
Method
The cost derivative method calculates how much the cost function changes with respect to the output activations. For the quadratic cost function, , the derivative with respect to the output activations, , is simply .
2. Complete the stochastic_gradient_descent
Method
- This function should train the neural network using mini-batch stochastic gradient descent.
- The method should:
- Shuffle the
training_data
at the start of each epoch. - Divide the training data into mini-batches of size
mini_batch_size
. - For each mini-batch, call
update_mini_batch
. - If
test_data
is provided, evaluate and optionally print performance at the end of each epoch.
- Shuffle the
- The function should not return anything.
3. Complete the update_mini_batch
Method
- This function should apply backpropagation and update the weights and biases using a single mini-batch.
- For each
(x, y)
pair in themini_batch
, compute the gradients usingback_propagation
. - Sum all gradients and update the weights and biases by averaging over the mini-batch and scaling by the learning rate
eta
.
4. Complete the back_propagation
Method
- This function should implement the backpropagation algorithm to compute gradients of the cost function.
- The method should:
- Perform a feedforward pass, recording all intermediate
z
values and activations. - Compute the error in the output layer using the derivative of the cost function.
- Backpropagate the error to compute gradients for each weight and bias.
- Return a tuple
(nabla_b, nabla_w)
where each contains the gradients for biases and weights, respectively.
- Perform a feedforward pass, recording all intermediate
Testing Your Implementation
After running the main
function, you should see that your neural network learns.
In the first bit, it should learn extremely quickly, then start tapering down to make
more gradual progress. This neural network will typically reach around 94% accuracy
on MNIST without any further changes.
Looking Forward
Congrats! You now know how to build and train a neural network. In the next chapter, we’re going to understand what’s actually going on when we train a neural network. Then, in the upcoming section, we’re going to look over every arbitrary decision of our neural network in much more detail in order to improve on our neural network.