Introduction to Neural Networks
This short textbook is meant to be a lightweight introduction to neural networks, so that you can start understanding them. Current large language models use the same ideas and more, but that’s outside the scope of this book. The reasoning is that there are many different paradigms within neural networks, so I decided to take on an approach that would let you explore the field as you wish.
This textbook is going to focus on what are known as feed-forward networks. In practice, they give us extremely promising results. In fact, the following widget uses them to, with decent accuracy, identify what numbers are drawn:
Course Structure
This course is divided into two sections.
Section 1 introduces the fundamentals of neural networks. That includes how they’re designed, how they’re trained, and what the algorithms behind them are actually doing. During this section, we’ll intentionally make a number of arbitrary choices—such as which activation function to use—without diving too deeply into why.
Section 2 is where we unpack the reasoning behind every arbitrary decision. We’ll explore why each choice matters, how different options affect performance, and what principles guide good design. In it, we’ll also explore why neural networks are so powerful and what their limits are.
This textbook is lab-heavy. You will be implementing an entire neural network from scratch using Python and NumPy. That means no help from high-level libraries such as PyTorch. Although such libraries are helpful for speed and efficiency, they abstract away what’s actually going on in a neural network, which isn’t helpful in developing an understanding.
Prerequisites
The required mathematical background you’ll need is fairly straightforward. If you’re comfortable with basic algebra, you’ll be in good shape for most of the material. For the labs, we’ll be implementing everything in Python, using the NumPy library for numerical work, so some programming experience is assumed. Some linear algebra—such as vectors, matrices, matrix multiplication, dot products, and transposing—will come up regularly. Calculus appears from time to time, mainly in the form of derivatives and gradients, but I’ve done my best to keep things approachable. Probability does show up in a couple of places, so it can be helpful to know, but you’re able to skip over those parts without missing much.
Credits
This textbook wouldn’t exist without Michael Nielsen. His textbook, “Neural Networks and Deep Learning” by Michael A. Nielsen, was fantastic, and I wanted to create something similar to it in an effort to practice my writing and teaching skills. Truthfully, I’d still recommend his textbook over my own unless you’re strapped for time.
Looking Forward
In the upcoming section, we’re going to start exploring neural networks. The goal for it is to create a neural network that can detect and classify digits correctly. To do this, we’re going to have to dive into the architecture of a neural network and how to train it.